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Thm r€port prcposas to ooiiiplete tie validation and 
refinaDant of a nan domaiii rajareaa^d taiting tacinology designea to 
assess litecal compraiension ability iii stuaents ia graflas 1*12* Thm 
domaitt xeferenced measures in this taclinology, alcng with ether aiore 
traditicnal aaaaures of reaaiag compreiansion^ literal and ^ 
mon^litaraXr are giibseguently intandad to be used in part in laxga 
scale stadles oi pxodtictitity in school reading pxogiai^, lo data^ 
studias of prodiictlFity in ceadimg instruction ha^a had little 
inflttenca on adiicational daci^lom-makimg due to sericus 
ffiathoaoiogioal proilems^ one cf the ma jor problaB^ being tha lack of 
aflequate measares cf program oatjiat* Tie report furtlar propoaaB to 
solve a numbar cf importaiit ijistitictioaal flanagaiemt problems craated 
by tta list of tie inadagaata iafcrmaticn available fton traditicaal 
maascres of xaading ccmprehan^iom. The nev dofflain ralarenced measures 
of reading aonpiaheiision vill ha^a an improved ba^is for scaling 
students on coiapraiension ability^ and abilty scoxes from this scala 
will be refaxencad to an adaitiomal scale defining aa i»aividual or 
group's ability to read in several domains of written disccurse* 
:^hese scaling feat^ras will allOTf for the assigniaettt ot students to 
specific levels of reading natarlals in specific Instractional or 
content dlofflalns, a proceduxe mot possiile with agisting measuras of 
readiag cofflprehensdon, (Author) 
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AJSTRACT 



the rieport proposes to cornplete the validation and refliieraatit of a 
nm doMin'^refeirenced testing technolDgy deiigtied to assess Literal com- 
pTehansion ability in students in grades 1-12* The doioain-r&f etencad mea- 
Bures In this technology, along ^Ith other more tradittonal measures of 
reading comprehension, literal and non-literal^ are aubseqiientLy Intended 
t0 be used In pMt In large scale studies of productivity in school reading 
programs* To datSj itudies at productivity in reading initructlan have 
had little Influence on edmcatlonal decliion-™klng due to serious methodo- 
logical problems^ one of the major problaois being the lack of adequate 
measures of progarain output* 

The report further proposes to solve a number of Important Inatruc- 
tional managment p^obleas created by the use of the infldequate Information 
available from tMdltlonal measures of teading oomprehensloni The nm 
domaln-^referenced lueasuree of reading comprehension will have an Improved 
basis for scaling studants on compr^ension ability, and ability scores 
from this scale t^ill be refareiiced to an addltlQnal scale deflnirig an 
Individual or grcup' i ability to read In several domains of written 
dlscoursee These acalins features will allow for the assignment^ of 
students to specific levels of reading materials in specific instructtoiml 
or content domaliiij a procedure not possible with eKisting rneasures of 
reading compreheiislon* 

STATUS OF THE BRCJlGTi 

The proposed %?ork has evolved through several years o£ research and 
development on major iisues relating to the asieasroent of achoot achieve- 
ment* Prior effcrts relating to the present work Include the preparation 
of a bank o£ itistriictioml objectives defining reading perfomance^ the 
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develo^enC and validatioa of a crlterlon-caferenced evaLuatlon ^stera 
knoOTi as GCTnpreh anSive AchlB yanent Moiiltorln^ ^ and e3^erM[ien.tatloii idth 
the xneasuronant of rasource utilization In taadlng prograns as an tiiltial 
attanpt to top rove the niathodoLogy for prodiictivlty research* 

Mo St recently, this research has turned to the deveLopiaent of tnoce 
adeq^iate meaiures of reading outcomea^-a majoir gap rOTalnliig in productlv-lty 
methodolo^p The intent Is to produce a teat developmeTit resource that 
will ba Uieftil at a variety of instltutlotiaL leTCli and a tneasUM that will 
be iinlque in at least tw majcif respect st Cl3 it trill he a meaiure of both 
comp rehensloti achievement and abllityp and C25 it ^11 he the otily extant 
and hroadly applicable measure of litaraL comprahension as sueh— **the 
ianpDrtanty generallgad reading skill that undeirlles all higher-orier 
reading Qomprehenslon abilities* Sane two yeara of devaLopnient afgort ha-va 
oulrainated in new measures of reading compTeheasloii that are refemneed to 
se'veral major domains o£ reading matarials ralavant to students in grades 
l*12# Thaie mea^res are the coniponents of a fleKible test- aaeenbLy device 
raf erred to as the Test _D&velopment Notebook or TDN* The TDN, as cttrr-antly 
concalvedy Is a resource for the assOTibly c£ measures of literal cotnpr^e- 
hension in grades l-*12p across all major content domains relevaiit to the 
schDol population. 

The Gontent of the TDN devaloped to this point ccnsists of the multiple - 
choice cloze component and an alternate measure of the construct oE litaral 
conip rehansloti based on tbe wh**itenu The miJiLtiple«*cholca oloze component 
C referred to as the MCC) cotisists of appro xtoately Ij JOO cl^zed pasaages 
Cgeaeralty, 60-70 word passages with ten deietlona and aGcoiapar^liig 
mult iple'^ehQ ice items) Gatagorig©d Ctenporarily) by readablLity live Is 
detaDailned by Spache and Dale*'Ghall readablLlf:;^^ fonnulai* The ^fa^/main 
Idea item pool consists of 300 passageSi 15 at each of 20 readability lav^els* 
Passage length varies systematically by readability le'u^el (eig.j approxteiately 
25 words at level 1 and tip to 220 TOrds at levels 17-20)s Each of these 
passagas is accompanied by as many as four multlple^cliDlce main idea items 
and up to eight multiple-choice t^h-datail itenis modeled after BoCTiiith*s 
(1970) ^-lt«ns. The fODaats of the cloze and wh-*inat€rial a are both objec- 
timi ganaifative procedures for preparing nimbers o£ parallelp rnultipla- 
chctce ltanis« 

The first field test of the MCC and ^h^-pitem tests was conducted xn Hay 
1975| In an ainlnlstration of both types of tests in a survey design to 
approKtoately 5,000 students ^read more or leBS evenl*/ over grades l*-?* 
This administration fulfilled several pu^osesi (1) It e^qpLored thi uie cf 
the testing inaterlals in applying a survey design in ctie te^ctual ar^eaf 
(2) it provided data for detailed Item analysesi (3) it pro'^ided an initial 
test of the ability of the systra in assmtliTig large nrabers of paralleL 
test foimsi (4) it provided a basis for testing out the Basch or latent*- 
trait model as an approach Co scallngi (5) it itiade available reliability 
data on a large mflnber of test formsf and (6) It provided initial conwrgent 
and discrtoltiant evidence on the validity of tha aonstruct* The more 
to^ortant conoluslonSf that ^ere drawn froni the field test are as followii 
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The existing papey-basad moael of the TON allowed ass«abLy 
of 36* 50«lt^ MCC test £owa$ iii a mattetr of a few hours. 



The application of the survey dtsign model in grades 1-9 
was generally succesifalp for both the. MOO aad wh--itm teitSj 
but the design can be topTOved In the future by raising 
the ceiling of readability fox lipper-grade test batterleSt 

The IteTO analysis data shoTOd tlat the MCC item foraat, as 
applied to. a given reading passage, genWally yielded a set 
of ItOTLs that were consistent and homogenous within and 
between passages^ Segardlais of passage level* (The data 
provided many iroportatit leads as to how the honiDgeneity of 
items within passages lalght be improved^ buti in geneTals 
eKtansive toprovOTentg vere not cequired,i) 

Large nmbers of virtyally paxalLel tests could be systOTatt** 
cally asarabled from, the TD^ from either the MCQ or ^-iteni 
Qompoments. With iinp*&ved scatingp the posslbdllty of qbjeb** 
tivaly asserabllng ^ tests v±th specified propaitlei Is assured, 
thus providing for transferability o*" test generation. 

The e^ertoental appldGatton of the Rasqli model to *2l6 MCG 
test passages showed that the ratio icala properties of 
this model could be achieved ^ith the itwi £om» 

Analyses of the reliability pf the MCG test fotcna showed that 
the tests assembled for the study were highly precise aaross 
all grade levels in thm study sample* The level of precision 
is su£f lelently high t& warsatit use of the tests at the indi*» 
vldual level* The reliability data further euppoxt the Infer-- 
ence that the HGG test ii rellabLi over short Intervals Cl«a#p 
alternate foms of the sme test will scale Indtvldtials stoi* 
larly on test-reta$t ^Ith a hl^ degree of preolsloii). The 
reliability characterietl&s of the irh*-itto tests were similar 
to those achieved with ttis MGC test. 

There ^^^ere several indications o£ support of the const aruQt 
validity of the clo2;e test in the data analysis* The ipCeTnal 
conslstenoy measures arid the iasch analyses Indteated the MGC 
test could be aoGU^ateLy dascTibed as measuring a hciaogeinaQUS 
trait across grades l-9s The validity ooefflclents between 
the MGG test and the ifeAi-^ltCTi test, an alternate measure of 
the construct I ware Gonidstently high (r ^ •81 at grades 1-3)^ 
eKoept vhere attemiated by range of talent* The MGC test 
generally correlated at appropriate levels with measures of 
verbal and non-verbal IQ^ California Aehiev©neiit Test COM^ 
sub scores In langudgft and readingi and a measure of paseage 
dependency. The MCG and «fh-iltOT tests converged In having 
virtually identical cotreLatlcni with th^ aibsGorgs, the 
IQ scoreSi and the scoce on passage dependency. Overall^, 
the resnlta were highly c&nsl stent across the 9 grade levels, 
lending conslderabl© ccedtblllty to the validity of the MGC 
testi 
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8, Th^ analyses of these fifcld-test data continua to date, as 
\^ell as use of the data to refinfit the MCC corpus of passages. 
Of particuLaT interest is a factor anaLysls of the test data 
to be rtiti shortly^ 

PROPOSED RESEARCH AND DEVELOPMEOT 

The data analyses on the reliability and validity of the WCC and 
wh^iteni forroats conClxiue to date* More detailed results, Ixicludlxig factox 
analyses, will be reported in a series o£ papers at the atinml coiiEerence 
of th^ jtoerlcari Iducatlonal ReBearch Association and the National Gouricil 
on Measurero.€nt 1^ Education this spring* The overall results to date, 
togethei: with reifta^s by a panel of well-known prof esslonal s tn readings 
psychollngiiistlas* and educational tneasurementi have ampl^ demonsttated 
the dasirabiltty of oomplBting the proposed work on the testing Mteriali* 

The proposed work on the testing naterials la designed to bring the 
TDN to m state ^here it can be used as a valid asses smant device in a varlet;y 
of evaluation contescts at state and local levels. The research effort ^111 
continue the stady of the reliability and content validity o£ the testlrig 
tnaterlals, but will focus largely on construct validation^ icallng^ and 
packaging 4 

Gonatyuot Valldatloii 

The proposed approach for further validation and refinerae^t of the 
testli^g tnateriaLs Is a series of concurrent efforts deilgned both Co study 
the meaning of the tests and to bring them to a broadly usable state* A 
set of pr€linilmry studies ^11 1 focus on further refining the MCC test 
format Ctlia measure of major interest) in preparation for a cross-sectlonal| 
longitudinal study cf test validity in a sample of approxiniately 13,000 
students in grades l'*12, 

the prelttnlnary validity studies will generally deterniitie the boundaries 
of written discourse to which an MCC test score can be expected to genemllze 
(l#eaj Does the meaiiing of the test score change when' pasaages vary eKtcn*' 
slvely in ternis of syntactic and seinantic complejclty of content area1)» 
In addition^ specific features of the item format and the cotidltLons of 
test adblniitratlon ^lll be studied to determine any additloml refinements 
that might be tnadc to the test* 

The taajor effort ofi the proposed validatioTi'^-che bro is- sectional ^ 
longitudinal study- -^ill eacamtne the boundarlies of the construct of literal 
comprehension in an eKpanded matrix of different textual^ psychoLingulstic^ 
situational, and psycholological factors* The lonsitudlnaL study will be 
conducted In a single urban school district that will contcitute a hetero^ 
geneous sample of more than 1,000 students in each grade ftoni 1 through 12* 
The daBlgn of the study will provide a developmental conte^tt w^lthLixi 'wjiich 
the contributions of important school and non-school factors to the MGG testj 
the wh«±tem testf and other, measures of reading coniprehemlon can he attidled 
across the 12 years of public schooling. The extent to which the various 
measures of reading comprehenslen change across the years of sctooling can 
be estlimted thtoiigh this design as well as the proportion of test score 
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change that is attributable to nianipuLable factors, such as reading 
es^erlances in the home or schooU Since scaTidardized meanures of reading 
comprehension will aleo be available for grades 1^9 of the study population, 
tha design -will enable a direct and cscitlcal coraparlson of the sensitivity 
of the varioiis con^rehension mearuras in accountitig foi the iBfluenGei 
of instruction and related experience* 

Scaling 

Gcmpletad ^ork on applying the Rasch model to the TDN paieage and 
Item ec^us supports the present propqaal to calibrate all such passages 
on a single underlying scale with ratio properties. This application of 
scaling invoLvei mounting a cOTplex llri^lng dealgti in ^hlch both the UQG 
and wh-iteni pools will be calibrated using a efflnple of appro^cimately 
50,000 students in grades 1-12, The proposed deidgn will result In the 
calibration of all test passages In th© various content domaisis covered 
by both tests on a cotmon Rasch scale* Then all of the inany tests that 
can be assanbled frm the MGC content dOTQalns mil be referenced to the 
same seal a. 

The proposed major calibration of the test passages will be preceded 
by a pilot study In which the cMpleKitles of the llrikliig design will be 
worked out by ejqpertoental application of the Rasch model to the HOG 
pasiages In several content areas outside the basal reader area* The 
proposed reiearch on scaling further includes the construction of derived 
scores for the MCG test and the establlitoent of formal pEocedurea for 
linking Rasch ability scores with the dlatributlons of readablLlty in 
related domains of inatariali. 

EXPECTED CONrRlBUTIONSs 

The project is espected to make a nunaber of theoretical and practical 
contributions to improved evaluation in reading and ultfaiately to Jinproved 
Instruction and better resource allocatloti at ieveral levels of the educa- 
tional enteiprise. Concurrent with the validity studies proposed for the 
testing material Sj a progrm will be mounted to gradually transforni the 
TDN into a state of broad practical utility. The principle elanants of this 
progrffln ixiclude computerization of the processes of test Item generation 
and test assembly (the former process applicable to the cloze fomat only) 
and the preparation of textual raaterlali presenting sttnulatlons and 
guidelines for application of the testing materials in a variety of evalua- 
tion conteKte* The specif ic products ejected from this and other components 
of the proposed research and development arei 

1* A testing package Cthe TDN) with a flnaltged version 
of the multiple^choice clo^e and \di-ltOT teitiTig 
materials along with a handbook and training materials 
for Its use* 

2* 4 technical report on the readability and other 

characteristics of reading materials in the domains 
covered by the testing raateielals* 
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3. 4 technicai mpQVt the use of the testing rnaterlals 
in a pilot pt^odij&t Ivlty study* 



4* A repoict or book on tha validity of tnajar n&m- referenced 
tests of readlBg compxeheriston (S^^ irBSj etc,) 

fffom the point off vtax^ of theory und cQnttiiti 

5m Periodic an4 final reports on the activities coiiduGted 
and the results obtained during the fumdlng period* 

piDJEcr Mk^mmEmt 

The research and devalopmetit ptopo sed heiee will ba sotiducted by the 
Bureau of School and Cultural Research, a unit that has mi% years of 
experlsnca in the devalopmtftt of crtterlon-^referetiaad testing In both raadi 
and mathQa^tlcii With tha miA of nationally knovrn aonwLtants in certain 
highly spaciaitzed area^s^i Bmh as scaling and decision theor^p the Bureau 
will asseirible a technl&alty amd professionally oompacant staff for the 
propoied task* 

The objectivity and ftic^hmlcal adeqaiacy off th# Ba:ceaui' s proposed and 
cOT^leted work on the task will be maintained by pe^^lodlc ajcternal review 
by a panel of nationally«'kTOmi e:j^erts in such filildi as psychoXingulBtics^ 
eognitlva development^ raailtig theory^ psych©raetlclaS|, computer teohnQLogy, 
and itatlstiosp The required technical facllltiei fior completing the task 
©Kist in the Education Dapartmenti 
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iNrRomjcriON 

Dlanal reports oil the level of licemcy In Merlcan schools and 
colleges— and in the nation as a whoLe'—appear with relentless reguia3^1ty 
in magasims and newspapers. StateiiienCi on alamlngly hl^ levels of 
"futictional Illiteracy'' and declines in atudent reading achievement abound 
in ipite of efCoifts to upgrade rfinding pairfomiiTica through massive 
expenditure e oti ESEA Title I progrOTSi the Right to Read, and other special 
projects. ¥ith this contradiation of Increased effort and dlintnishini 
TOturns^ questions may be raised as to the bases on ^Ich Judgements ^xm 
madaa Wiat ifs meant hy "literacy 3,-' and how is the achlevemant ojE sp^ckif labia 
levels of literacy measured? 

AiSOTiing that "literacy'* refers to tninlmal competenoe in raadin| 
comprehensioni what la lacklng..^ls_a- validp accurate measure of literitifty^ a 
means of detarrnining minimum compatency in reading comprahensioru XS Bo^ 
those who are Goncerned with the state of reading in toerlca—and wltii 
productivity in toe^lcan school s«*should first be concerned with the wall" 
ability of appropriate measures of literacy-related outcomes of school 
raadlng prograns. 

This report is concerned ^ith Juit such a measure* Its iocus U 
the development of an accuratep usafulp and aconomical test of literal 
comprehension, a f undanental reading sktlL and the skill involved in "^Aiat 
Is usually meant by functional literacy. The particular innovative ni'^asure, 
the subjecc of the report, is the Bwm mil tlple- choice clo^s format. 

The first four chapters of this report present a thaoretical rationale 
for the SFPED TOUltiple-qhoice cloze. They contain a critique of traditional 
ineasures of reading comprehensioni a discussion of psycho linguistic thmicy 
relative to efforts to measure cc«npreheiision| a brief discussion of teh^ 



11 



conventional close prQcadure as a test of cornprahensioni and a statement 
of a tentativa construct of literal comprehension^ Including one of Its 
operatiomlizatlons In the inultiple--choice cloze fomat developed for the 
SPPED Test Developmexit Notobook# The fifth chapter describes the advantageous 
propert ' jS of the SPPED mul tiple-*cholce cloze which ^ould make it a broadly 
useful as well as Grltically taportant tool for measurCTient and evaluation, 

An overvie^r o£ the research to date and of the future research and 
developmant planned for the multiple-choice clo^e and related tnaterials Is 
presented in Chapter VTm Chapter VII outlines a detailed plan for calibrating 
the Tnultlple-choica cloga passages on a ratio scale based on application of 
the Rasch model to the cloze testing materials* Together^ these plans are 
designed to bring the cloze testing materials to a broadly usable state 
in policy research on reading and in the manag^ent of reading Instructiont 

The al^th chapter provides a detailed discussion of both conventional 
and Rasch Ifcem-anaLy sis data available from a preliininary administration of 
the multiple-choica clo^e testing materials to a sanple of 5,000 students 
In grades 1^99 These ItOT. analysis data and critical e^camlnatlon of the 
testing materials show that departures from the esqpected characteristics 
of Che testing niaterlals are Infrequent^ Further, current review of all 
extant multiple-choice cloga materials promises to diminish an already low 
incidence of errors In procedures and execution. 

The ninth chapter, as well as part of the seventh, reports research 
do date which suggests that the prelamtnary test development and aiinlnis- 
tratlon of the SPPED multiple-choice close has been highly successful* 
The cloga testing materials and an alternate measure of literal ccmpre-» 
hension developed for the research, called the wh^ltem test, were shown 
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to be highly tiitertialL^r consliceiit i« tie itudy pp.pulat£on* The prelliiiina.sy 
^ePsa^GH data iutfther fto^^ided sutst^ntdtil iBdleafcloni o£ thi validity 
^£ th^ c1d^« CcOTat* □ vital 1 m^ulti of the riaw^ show that «he 

3iypdthit4eal advantage si proposed lor the twltlpls'^chaice cLoge Sorma-c (^ig.j 
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CHAPTER I 

aT^-^RDIZED N0M1- REFERENCED TESTS OF COMPREHENSION 

j 

km n^tmi jin *he Xntroductlon to this reporti the Aaerlcan pcopLe wiLl 
alvrayi hold their achool iystems accountable for thi Lltaraey of itudents, 
5r€t teachers h^iVe been provided witli neither an aGceptable standard 
Litcraoy tiq^ tina itools to measure the baste reading abilities iinpllad by 
''Ittair^G/i^* Standardized, norm-refertmed tests of reading comprahetision, 
L% a^pica cf their mny JisadvaiTitagea, certai^Ly have a place In adueatlonal 
tciting)^ but thoj^ are entirely Inadequate as meaiures ofi ablLlty or achieve* 
mcnt in L iteral oCTprehanslon* In the firat place, ^ the standardl^edj nom'^ 
rcEetenced cosci vJied to 'measure achleviOTent In reading eomprehenslon a^e 
ptitantLy Maaed toward a conceptuallgation of raadlhg as reasoning. Besides 
this J they pre soil t at least four additional problemes (1) They-are too Buhm 
^mt.iv& Lt\ conettkjetlon to be reproduced or properly valldatedi (2) their 
s^aL ing piopQrclas make scores dlffftcult to Interpretl (3) thmy are Inseitst- 
tive to i^dlvlduaL gain or growthi and (4) they are f Igld In format and 
tJiarcEo^re llnilced in utility. 

Te at-*to^ar"B seldom specify the conceptual igatlon ©fi comprehensiQn behind 

thai* tcacs, much Uss the psychol ingulitic theory and experimcntaL evldeiiae 

« 

^Pa'lroardly- as peedlctors of academic success (Anderson, 1972} Carver, 
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e\ippQttin.g such a conceptualisatlori. Finding tio explicit constructs, 

other than label i on subsecticns of the testsi reeearchirs are forced to 

use various analytical techniques to tease out the notion of copprehensloB 
2 

from test scores i 

Ho^aver^ It is not InappTOpriate for the consumet of such tests to ask 
why there Is no expltcit statement of i^hat It is the test attempts to measure 
and, given no escplaxiatioii^ to speculate on the reason for its absence* 
Messick Cl975)s for tnstanc€| notes a long-standitig bias against construct 
validation iii educational measurement* Test^makers seam to assume that 
^^educational measurenient Is primarily concerned with "^hat a pupil can dOj 
atid ^that] the natuire of the accompl istaent ii clear from the specif Icaticn. 
of the tasks" (p, 958)* But the very temis used £0 labet tests and to 
interpret test scores "Itnply process interpxetationss such as scientific 
reasoning or reading comprehansion** Cp^ 958)* Popham (1975) argues (le&s 
generousLy) that the coimercial publishers -^bo create and market standardised 
tests "are loath, from a waiketirig viewpoint, to spell out estactly tchat their 
aKams measure" because the tests must be marketed nationally^ and '*many 
educators would find them luconslsteiit vgith Local instruGtlonal pxrograms'* 
(sec- 2, p- 4) i In any event, wlthont an explicit atatement of theoretical 
and empirical relationships 3 '^the burden of construct vaLidation [is foisted] 
Onto the consuiners '^ho will inevitably make Infer encas beyond the universe 
of situations representatively sampled by the tast'^ (Gronbachj l971| p. 483 )• 
Instead of stating an explicit construGt, w^hlch is subject to rival inttarpre- 
tatlons, the publishers o£ standardiaad comprehension tests usually preaaTit th 



For a recent, critical review of ^'psychoinetric research on compreheii* 
sion in reading, sea Davis (l972)* 



coneuiner wiEh corcelattoxis be^ein staiidardiged tests «f the same ILk. 

But correlacioms betw^aen equally ambigiioui tests are 1 soLac^ to cdti- 

sumers vho would Like to know whit my o£ them actt^alL^ itfea^utfe* 

yhile standacdii^ed comprehension tests are nofcortcu^ iot th^tr lac^ 

of cxpllcity stated constructs, mogt ^eetti, in fact^ b^sed Thomdike ^ s 

(1917) Introspectioiis ori ceading. In hi^ conceptually atton of c-omprehetision, 

Thorndilce made no dlstlactiOTi betwein re^dtiig and thtollrti 

Uiiderstendtng M , . * printed paragraph £.8 CLher? a nattir of 
habitSs ccrineGttoiiSi cisncal bo^ds^ but tfciei©. h^m to be 
aalactEd from so Tmny otiierSj #nd gl^en weiglit^ dell- 
cately, a^id used togetheaf in so alaborat^ art oatfSa^l^atlon 
thit ^^to Tead'» meatis "to thiidc*' as tiuly as do^B *'to feval- 
uata'* or "to demonstrate'* o£ '^co ^erifyw ^ ' (p.* fil^D 

Not only did Thorndlke find reading and ^htinkiiig conG^pti^aLay limdtitlTigtiish- 

able, but the eKtensloti of the comjiriip*! tio a-viluatloinj detfioriit^ratloti 

(proof)^ and ver if i cation ItnpUes the e^^atlon of i^eading asid "high oider»' 

tlilnklng or reasoiilng procasses ; 

The iuccaisfuL raspoase to a qtiestiom or to a ^Ir^ga^aph's 
maaming Jjnpllas the re^t3:ai«t tendencisi of masiy wrds . 
to be ovar^potent and th^ spsciaL mlghtitig of otttea? 
tandencles* This ta%Ki€ q^lt£ bayomd ttoa pow^r fif weak 
Tttindi axid is sf the same saLac^tv^a and coocdlln^tl^ nature 
as the xnora obvlDUS fornix o£ r^easonliig In math^n^ tl^^ or 
sciance* (pt 114) 

Thorndlke's conceptual Igation of r«dlng as a thiridng o:^ c^asoning process 
has had eiiormous InfLuence on tie taachlaig of readini (D^vtaj 1972) and the 
construct Ion of conipreliansion teitSi 

Nfow fa^ paopla wilL deny tlat cQntpr^haTislon invoLve^ tiirictng <pTOcesaing 
information) or that critical /evaluatiy^a reading mni re^soTili^g shaia sorae 
intellectual skills Ce*g#3 dedwictt-^e and inductive reasotfilrtg) But tests 
that overamphasiia ori ttcal /evaluatlire reading skills at th.e mKpmsm o£ more 
fundamantal skills Ilka those &f L4teML comp3?ehan iioicij rfotf e^carspla, ha^re a 
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limited utilttsr. Teachers^ after all* have long felt It aeceseaTy to 
distinguish beto^^eift '^reading the lines, reading beCTeen the lines, and 
reading beyotid el© Lines'* when, teaching such a complax behavioi? as reading 
comprehensioti. They are aware thatj as Fedejf noted In 19S8^ '*th€ tasks of 
answaring factuai questions and of inaking Inferetices call to a ccnslderable 
extent on quite dl££irent fu^damantal skills in CQmprehetislon" (BavlSp 1972, 
p, 658,) The rnoiTOTent toward teachiTig by objectives and timsteey learmng 
has made such distmctlcris e^aii more topoitant* Tests that stress reasoning 
processes fall to g£ve propear tmpltmsis to basic coinpreheTislon skills that 
are devaloptnentally and Logically prior to mote axtensive prices iLng of In- 
fOCTiation Iti a text* 

Other than a few token Itams labeled ^Uiteral compreharisloni*' cradi* 
tional tests oi raadiiig compTahenslon, following Thomdllca, make no such 
distinctions* They are so biased toward a conceptualisation of Teading as 
reasoning that tha^ hardly constitute tests of comprehension as such. 
Beyond fourth gradSj. when reading instruction coticentratas on comprehension^ 
Ltems on readitig comprehension tests becoine Increasingly Lndlstlnsulshable 
from verbal Iteias on 1Q_ tests (Singer, 1973). Besides correlatiiig with IQ 
tests of general veirbal ability^ traditional comprehansioii Casts even correlate 
substantially ^ith non-verbal^ fiigure-amlogles tests of Lntelll^ence 
C Carroll j 1972)- Obviously^ a student needs some modicum of Intelligence, 
espacially In symbolic processes, to be able to read at allp but If the 
^'acquisition of symbol- sound correspondence is within the tnental range of a 
group of students and Instructional conditions allow adequate tline for 
achieving the taskj then IQ tnay hava a slgnlfilcant relationship to rate of 
acquisition but wt to acconipl Istaent o£ the task" (Slngar^ 1973, pi 1), 

Passage Dependency * Traditional comprehension test Iteitis aia so biased 
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toward reading as reasoning ttat students can score wel 1 - afcave charice on 
significant numbers of test Itsmi vlthout ■bottiaritig to Tsad the passages 
upon, which the questions are suppoaedly baged# Atld yet any heading coTOpte-* 
heiision test purports "to measure how w&TLl a student understands ^hat he la 
raaditig* Ihe questions used to ascertain the degfee of this under stan.ding 
arc based on the tacit assumption that a diMct relationship exists between 
reading a passage and answeffliig questions about It" (Tulman^ 1973^ p*Z 20S)«' 
Wea\ra^ Bickleyp and Ford (1969) testad that aiiisaption with aaraples i^m matty 
standardized tests o£ reading comprehension* They discovered that college 
stiidatits who did not read the passages upon which the questions w%^b based 
answ^ered as tnany questions correctly as college students who did read 
the passages i ObvlousXy, Tnany questions wete not pas sage- dependent* The 
passages J that is^ were not the only sources of the information needed to 
answ^er the questions* A moce recent study of passaga-depetidency by TyLltunan 
C1973) in grades 4^ and 6 found that the *'av^erage probabilities cf corarect 
responses ^ith no passage present ranged betoeen ,3Z and *50-'-weLl above 
the eKpeizted chance score ofi #25'' (p* 206)- Ihe nOCTi-reEeffemced tests usad 
In this study were (a) The Nelson Reading Tastj (b) The Gallfoirmla ichleve- 
ment Testj (c) The SRA* Achievement Series j CdJ The Metrcpolltan Achlev^enieiit 
Test, and <e) The Iowa Test of Basic Skills* 

Processing infonnatloii derived froin ^rltteii te^ct rmy well be similar toi 
processing iTiformatiOn derived from other verbal and tioii^verhml sources 
C Smith, 1975)* But when standardiaed reading ccmprehengloft tests atresi 
inferential and related reasotiing processes to the extejit that the Informal- 
tlon In the tejct becomes suparfluous to the test itmaBi then the conaeptual* 
Izatlon of reading comprehension implied by sucii tests straitis credibility^ 
KaCher than cnaking inferences about i^hat the teat nteans as a consequence 
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of having read the text, students can Infer "the tfteantng" of tha te«t Cas 
Interpreted by the test-wrlcet) fw™ the test Items, thaoselves. The Infor- 
imtlDii that is assijnilated to cointtlva structure may not darived from 
the text. Theiafore, scores on auch tests caTmot be used as. evidence that 
the studenti dllfl in fact comprehend '•the text." These scorea Imply cortipre- 
hensioii o£ the test items rather tham compreheiislon of the text Itseif. 

Beildes straining the atalogy betweeii readltia and rBasonlna, the passage 
independence of the items on standBrdtzed reading comprehension tests^ raises 
more serious questions about the pblectivity, ucaity, and validity- of such 
tests. Most Lnstructlonal readini proirams, including those that teach 
"sampling procedures," necessarlLy proniote a caBeSul perusal of the text, 
Imdeed, how ace disputes about the meanltig of a text evei resoL-ved eKcept 
by reference back to the teitt ItaelC? (The relavaflcy o£ bloiiaphlcal and 
other extra- textual infownatloa, for instance, can only be deteOTtlned by 
reference to the tneanings tepliec3 by the text itself. ) What use is a reading 
teacher to make of scores froni "readlmg comprehension tests" that Invite 
students to Isnore the text, that proniote "comprehension" skills specific 
to test- talcing lather than coniprehensio-n skills in generaL? In facta teachers 
characteristically deveiop coniprehensloTi skills by using questloiis to direct 
attention to salient features of the text, and, in dolni so, they run the 
risk of training students to validatte the teacher's tntarpretatlon of the 
text at the expense of tihe stuaents' owm perceptions. But teachers have 
a saving gracei they are In a position to recogniae and proniote the student's 
independent efforts to Interprat the text, No such opportunities exist on 
tests. Given the multiplicity of "inter pre tations to whtch most segneots of 
connected discourse are subject, what justif Icacion is thera for the idiosyn- 
cratic interpretations repreaent«d. by the -quastlons and ".corcect" answers on . ' 
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any glveri staridardlzed readliig comprehension test? Grantedi most skillful 
readeri ^rould usually accept the validity of the Interpratation of the text 
implied by most of the test items on standardised tests* But yfhy any parti- 
cular inter pxetat ion at the arbitrary eKclusiori of others In a test which 
claims to measute the general ability to apprehend the Bneaniiig o£ printed 
discourse? Or do the test items represent a randoin sampLe of alL isiblt 
interpretatloni? Clearly not- No wo test writers interpret a gl^iaa^ t^Ht 
In the aame ^ay^ and this again raises the problem of specifying what 
standardised comprehension tests actually tneasurei 

Jj'^gfegg _^^lzg^&_ly Factor analyses of scores from itatidardlsed compre* 
hension tests^ rather than clarifying what such tests measara^ only rev^eal 
the liDdgepodge coiicaptuallBattons underlying themt "Vbcabtjla^y IcnoiB^ledge^ 
test«taklng skills, and coinprehension skills are all subsiimad utider a vague^ 
global notion of ''comprehension*" Davis (1941) , for instance, first identl'* 
fiad stveral hLUCidred '-reading comprehension ifclllsp" and thenj noting a 
consldeaeabla overlaps reduced them to nine rtest^Able akilLs" (L944), In 
1968, he reafEtoned the Independent e^iitenae of ad^t of these slcills. . 

Davis height linique skills are listed in Table 1.1 . Of theaej 
Skill 3 finding amswers to questions, with a slgnlflcaiit 13 and 7 pereent 
of nonchance vatiance— can be excluded because It Is a teat- taking rather 
than n cOTHprebensloTi skill, CThs test items thetaselvas introduce reasonLng 
and Itiferentlal processes and difficulties which may be extraneoiis to the 
actual coinprehenslom process ^Bormuth, 1970]), Moreover^ Gartoll Cl972)j 
noting '^tle unique variance railding in the tests of these ikillsj*' '^Is 
tempted to concLuda that perhaps only four or fl\re of them inerlt recognition 
as distinct skills, and even these are rather highly correlated iii high- 
school pojulatdoins" (pi 2). Excluding Skill 3, the retnainlng slcills uf 
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Table 1*1 



Per cent of Nonchance Varl^mce of Bach of Eigh| 
Skills That Is Untque in the Set of Skills Used 



(N = 938) 






Cross 


■val'ldation by 






Skill Items 


i and 


day LbJ 


rtems only 


1. 


Recalling word maanltigs 


35 






29 


2. 


Drawing ImEeratices about the meaning 








8 




of a word from context 


- 1 






3, 


Flndinig answers to questions answered 


13 






4 




explicitly or merely In paraphrase 






7 


4, 


Weaving together Ideas In the contant 


5 






5 


5. 


Drawing tnSerenceS from the content 


23 






18 


6. 


RecognizlTig a ■writer's purpose, 


14 






■8 




attltoade, tone, and fnood 






7. 


■Cdetitlfyttig a writerjs techniques 


8 






3 


8. 


PolLowliig the structure of a passage 


15 






12 



Itotei FcOTi ^'Resaareh in Comprehension In Eeadiiig" by B» DavlSj 
Reading Reseagch Quart aylY, 1968, 4, 499*»S45. 

^ha TOgativa antxy in the table piCQbably TOprisents a ahaace davlatlon 
from a ^aro or sli^tly positive true value* 

[ An aqulvalent fonii of the sarna test was glveii to the same students 
after an interval of one or to/o days*] 

slgnificaiice— racalliiig wrd meaningsi toawlng Infarences froin the content} 
recognl^iTig a witer' a purpose, etc,| mnd followiiig the structure of a 
pagiage--*Eapresaiit polar eKtrames in a hierarchy of reading iklllSj as 
would be esepectad frotii tests wlitch saitn so indabted to Thorndike's coti- 
captua ligation of CQmpraliension#. 

The largest nonchance variance li represented by the iklll of recalling 
word meanings. But the skill is ^^easuired by reaognttloii vocabulary Itains-' 
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(Davis, 1972^ p. 663)««wi»diS la isolatloni that ti^ oia wrds in. Limited 
eonte^cts Cus-ually no mom fchan a phrase)* Now oi'bvlously .wrd knowledge is 
necessary to corap^ehensianii buc the gkllls invotvsi %n cecognlaing wrds • 
in isolation or In Itoiitid aontejcts are tjuite diffit^Tit fiom tha skills 
involvid in Intai^retlmg th# Intsiifelation^ipa betw^am word meanii^s and 
syntax im cgmnected dtsaoiw^e* Skill 2^ on tdia otitiett handp deals with words 
in Qotit€Kt and i® much alo'ier to coisprehension. o£ oommetad dlseotiriep but it 
represeriti only an inaigttif icant pereemtage of tiot^laance •varlanGe in 
traditlorial compr^enalotl t<iati» 

Wiile the major v^axtBMt^ *'rftcallttig wrd me^TOlngs^'* **aB meaiurad by 
recognicion ^cabula^ ifeim^i,'' seOTs to lie QUtatdlt tt.€ pale of reading 
comprehanaioii (apprehaniimg the meaning of conmottd dlscoiirsa), the rCTiain- 
Ing skill i—dcawing in£e3^tmees^ recognising purpoi^s. at^a^ and following 
St amottira— represent th«a wpp©;^ reaches- of the hieaffurohy of comprehension 
skillsj the reasoning praaassei^ rrom his analyses^ Da^ts drew the general 
conclasJ-on that • • » [^ootijijrehenslon] is largely dtpe^ndant on knowledge 
of wrd meanings and on ability to reason in vmihmX temi" (Davis, 1972^ 
p»663>* 

gu bie Qt ivl ty 

To be majClmally tjis&fmt in nieasurliis achievement In reading comprahen- 
slon, a test miist be ohjtc^^i'tre enough to be rep»dttelhlfe# *Ihat is, several 
test mrlters wrking indepwdently with the sawe c-oii^nis of materials rauat 
be able to pioduce #si'©tt.tirtly the saie test. T^hat thli ineans in practice 
ii that test OTiterSji whm Mlectlng the mat^^ial^ to be included In the 
test and witing questiot^S ibout thoss ittaterial%, mast fellow a detatladp 
esqpllcit luleB systes {S'^ii#iat like a &omput« algo:eitlttn) tdiich radically 
lljnitis the opportunity bo Mke B-ubjeetive declaloM h^sed on personal 
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blaies idioBytieracieS. Saveyal advantages are gaitiad by such objectl-yityi 
(a) if th® ttst is reprodu&lbi€|. tAkntm is an objsctlva basis for Glatolng 
that two AiMimtmnt fomm q£ tshu tm&t should have the saaa label (e.g., 
"ifaadlng oocoprdiaTiston'Ol^ (b) it biacamea possible to aKamine othe^lsa 
a^bitraiy 'Slairai about ^afc fche mBt actualLy measures^ for its gemsts is 
public m& ta^sceablai (c) It att&a becomes possible to cmnpare the Msiilts 
of two diffsEent tests lii halation to the reading skills belt^ meaauTedi 
and (d) iiifif arant foTOs of tht STOa test can be canpiled easily and used to 
monitor wading dewlopment mvrm ^ort periods of time. 

Unfo3CtutiaCaly, test AmvmlQ^mnt proGeduiai for standaicdlged tests of 
reading cowp^ahensioii fall Smic ^h&rfc of this teind of objectivity. Publlihars 
have davelopad a carefuli tmditiotial procedure for eonstmiotitig scandardl^ed 
tests, but subjectivity Is app^iint at eve^ stage of the process* lest 
writers b^gltit for instancy by devaloplng an outline of the infonaatlon 
tdbe test i^lll aovar» But slwa©. '-iKhe outllntng prQCadure is lll-dafined^ 
it ^Is dlf iloult to verify that mn itean measuiea thm content olaimad by the 
label" (Botniutht 1970, p. 12)* TMn^ the passage-sampling pro^cedara^ i^ 
tlva* Owe thB passages a^e i^leotedi the test writer is constantly laakiiig 
aubjectlv© daoislons about vhlcb. questions to wlte on each paisaget Some 
queatlona ai?a rejected as top others as too difficult or too v^rdy, 

and so The reeult, as Bommth (1970) has cCTmented, is that tha tast 

writer 1^ ^^Implicitly deslgTiltig the test*' as ha goes alongj "but doing so 
in a TOairm^:jc that is not opan to Inspection and » « • revieij^' (ppis). 
Perhaps it Is precisely tha sealavaTit courie eonCent that is present the 
final £Qtm of the test| but th« swhitantlal lack of objectivity makes 
veriffieation impossible. 
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To be pra&iie in measuE'lng gain or growthp an achievCTsnt test scale 
must ha^e equal Intervals and a meaningful mmto polrtt# h ruler, for 
Instancep Is a meaitirOTetit device with equal intervals and an abiolute zero. 
An tneh at either end of a wler tm still an lnQh| ot an inch in linear 
Space is eqtiaX to bt^ other measure of one Ineh in Linear spaoe. But part 
of the *itteanlng" of that measure of one inch is the posilbllity of aero 
length or no Inahes* The Interval of one inch is an ab so lute measure that 
does not need to bs transfocned for comparison with another measure in 
incheaft Onc# a tist is developed to measure gain on a Kale with equal 
intarvals and a meaningful zeaeo point, it becomes possitle to inte^ret 
differences in raw scores as tttie quantitative measaras of gain or growth 
within Individual attidenti over a period of time, 

In addition to laqual interval scaling and a meaningful ^ero point, 
a useful test development proeedare must be based on peTson-free it©ii 
calibration and itemw^free person Tneaiurement. Such a pTOeadure would result 
in test scores that e^ould be inte^reted In tema of an absolute scale 
(persan-^free) rathsr than In relation to the partiot*lar students ^o took 
part dn the oi^lgiml calibration of the test. The procedure would alio 
produce test moXMB that TOUld not be dependent on the particular itans 
used on the t^st (itiim-free). leading comprehension Casts scaled Iti this 

wQuld result In measures of achievonent on a scale £wtm "little ability*' 
to V%a%toum ability*"^ Interpretations of' raw scores wuld be referenced 
dtractly to this equal ^ifiteryal, raeaningful^zero scale. Equivalent and 
parallel test {om& eould then be assenbled for accurate^ periodic testing. 
School dlatrl&ts co^Xd also conipare the effects of different educational 
treatnients on Imdlvidual students or groups of studenCi. 
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But noCTi' Referenced tests do not have these scaling properties* 
Instead, test scores are referenced to the particular group of studantt® 
used to fiorni the test* The eitimate of reading ability these tests produce 
i# dependent upon particular peopLe atid the ipeciflc content of the t#:&t» 
Comparisons between test scores on different forms of the test are tnad'a 
difficult* in part^ because the content of the two forms is not Tieces^arily 
comparable/ scores cannot be interpreted easily because there tio 

meaningful zero point and no equal'^lTitarval scale* Standardiged^ novm** 
referenced testis therefore^ cannot produce accurate^ easily interpretable 
measures of achievement in reading CDmprehension* 
S^naltlvity ' i ; \ . 

Rather than sectlng out to -aSseas gaisi within iiidividualap standa^ffdisred 

norm-* referenced tests are designed to '%ieaeure the stable^ between- 

Individual differences that traditionally have been of prtoaEy interei^t;^ to 
psychqlogical testing" (Carver, 1974, p, 512)* The design principles q£ 

such testSj that is, deliberately maxiniise individual differences* Fot: 

example, questions that most students aMwer either correctly or incorrectly 

are eliminated ftom the tests in the experimental stages* The most eiE^iclent 

question, for purposes of Mlf ferentiaicliig batvreen Individuals, has a p-^sslng 

proportion of ,50 (or *625 when corrected for guessing)* The tests, then, 

are referenced to a norm group rather than to an absolute criterion or m 

criterion based on specifiable test content! they are "so constructed that 

at each grade Level they attain a normal distribution of test results*^ 

(Singer, 1973 5 p# 4). The reliability is determined by internal consistency 

and the stability of response to the same test administered at two ditt^rent 

■ times* toy sensitivity norm*' referenced tests might .have for measuring, gain 

or growth within Itidivlduals over a period of a school year is systeTO^lcally 

eliminated In the Itam^selection process* Standardised, norm-ref erene^d tests, 
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that Is, are insensitive to short-term achLevement in reading comprehenslori* 
As a Gonsequencej they are also insensitive to differences in educational 
treatments, 
FoETnat 

Conmercial fims design and develop most of the standardized, norra- 
referenced tests that schools depend on. The designi construction, and 
validation of these tests is tinie*» consuming and requires considerable ex- 
pertise^ as wall as i^hat some coxiinentators (e.g., Davis, 1964) call 
"artlstry-% so they are, of course, expensive testing instrtunents* 

Part of the salablllty of these costly tests lies in their format, 
they come In pre^assembled packages that are easy to atoiniater. But it is 
precisely that infla^lble formt which is the source of their lljni ted utility 
ands as a consequencej their enormous hlddBn cost* The rigid fiormatj for 
Instance^ containing only a few parallel test forms, permits only one simple 
evaluation design, a pre- and a post- test* Moreover, because the pre- 
packaged tests cannot be taken apart ftnd reassembled to Gonstruct a test of 
appropriate difficulty for an individual student or a particular group of 
students J standardized, norni*'re£erencsd tests yield imprecise measures of 
achievement* In order to measure student achievement in reading comprehension 
accurately, the test adininistrator must assign the student to a test form 
with a level of difficulty which is very close to the student's actual level 
of reading achlevementp The more the test varies in difficulty from the 
student^ s actual reading abllityj the more imprecise the measure of that 
ability* Since standardized, nom-ref arencad tests are j^inf legible in format, 
since they contain few parallel test forms, and since each form covers many 
levels of difficulty (e.g., a 4th grade student may face 10th grade reading 
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materials), it is nearly impossible to measure an individual student's 
reading achievement accurately. Rigid test formats^ Aen, are not only 
ijrherently expensive, but they prevent school systetn? 5^om Implementing 
satisfactory .evaluation designs. 

Standardized, norm-referenced tests of reading ^d^ptaheniion are raliable 
predictors of academic success, but they are entirelf tmdequate as measures 
of ability or achievement in fundamental cQmprehensi^t^ ikills* Though the 
publishers of standardized comprehension tests are t^#th to specify what 
such tests measure, factor analyses, high correiation# ^Ith IntelliganGe 
tests, passage independence of test Items, and revi#^^ of the content of the 
tests reveal a bias toward critical and evaluative r^^^itig skills. In other 
words, standardized comprehension tests slight what usually called 
"literal comprehension^'-- those very abilities (l) eh#t are basic to more 
advanced reading comprehension skill (2) thrt take a CQniiderable por-* 
tion of the reading and instructional time In most ^^a^^ing ptpS^^JuSj (3) 'and 
that are vital to the development of a literate popu^^^^f a b&sic goal of 
school systems. 

Indeed, if "levels" of comprehension (e*g*, reaWhg the lines, reading 
between the lines, and reading beyond the lines) ere ^^ncelved as steadily 
expanding contexts for interpretation of the text f^Wteasingly eKtensive 
relationships between the information in the text and the cognitive structures 
of the reader, it can be argued that there is little ^^Sslblllty of ever 
locating more advanced comprehension skills in that c^httnuum until the base 
line is drawn, until literal comprfehenslon is defined tests of it 
thoroughly validated. Until then^ tests of critical 0hd evaluative reading 
skills (i.e., standard comprehension tests) are cond^i^^fid Co float Indefinitely 
in the limbo of vague, global conceptualizations whi^l* ^te antithetical to the 
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movement toward teaching by objectives and mastery learnings Fori unless 
test-makers can identify the lowest level of meaningful synthesis (e-g.^ 
"literal comprehension'O between the linguistic features of the text and 
the cognitive structures of the reader, what possibility is there for identi- 
fying more extensive and complex Interrelationships? 

In addition to these conceptual and theoratical difficulties, standard-* 
ized comprehension tests have limited utility due to a lack of objectivity 
in test construGtloni scaling properties that make test scores difficult to 
interpret I insensitivity to gain within individuals and differences in 
educational treatmentsi and rigid, costly formats. 

It is apparent that school districts need a test of literal comprehen- 
sion based upon an explicit, viable conceptualization of literal comprehen- 
sion. Further, such a test must be objective in construction^ scaled with 
equal intervals and a meaningful zero pointy sensif:lve to gain within, 
individuals and differences in instructional treatments, and flexible in 
formats 
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CMPTER II 



COOTREHENSION 



Any attempt to measure reading coinprehension should begin with a con- 

2 

ceptuailsation of compreheniion that ie grounded in conventional usage. 
People oftan uee the phrase, "reading comprehension*'- to refer to the act or 
process of apprehending the meaning of written dlacoursep Since the pracess 
of comprehension is complex^ ^tremely rapid j and entirely covert * the teat- 
maker ie necessarily limited to attempts to measure the product (rather than 
the process) of comprehension. And the product of comprehension, the thing 
to be apprehended* is meaning. As noted in the preceding chapter* a test of 
reading comprehension must measure a student's apprehension of the meaning 
of a particular segment of printed discourse^ The obvious implication la 
that the test-maker must first Identify the meaning (Carroll* 1972) or^ more 
generally* the kinds of meaning (e,g** explicit) that are to be apprehended. 

But the theoratleal problems involved in identifying the meaning to be 
apprehended (much leas measuring the student -s apprehension of it) 



^The following discussion of comprehension and meaning is based on a 
modal of reading as a constructive language process* the most recent 
expression of which is Smith (1975), For a review of the evidence for such 
a model* see Ryan and SOTmel (1969) < Katz (1972) was the primary source 
for the competence model assumed by the performance model. 

^"Ordinary language often embodies concepts which have developed and 
endured because they capture something of significance to human beings. 
Thus* ordinary language concepts have, at least* a prima facie right to 
our consideration* especially when we are studying human beings* and they 
should be replaced by a technical vocabulary only when there are clear 
onplricsl advantages in doing so and when we are clear about the human 
significance of the change introduced" (Strike, 1975^ p. 462). 



are laby^tinthian* "Meaning" Is even more conceptually ambiguous than 
"cOTipS'ehensiQn," and many a theory of comprehensions si Smith (1971) sa 
wryLy= notes, has foundered on efforts to determine what "comprehension 
and meaning "really are'" (p* 185). But the labyrinth iaems to be 
unavoidable* Efforti to evade conceptual difficulties with ^^operational" 
definitions of comprehension have not resulted in viable tests o£ reading 
comprehenaion* Besides, what possible justification Is there fo:: labeling 
a test "reading ccmipreheneion'- without "marshalling evidence In the form of 
theoretically relevant empirical relations to support the inferences that 
an observed response consistency has a particular meaning" (Messlck, 1975^ 
p. 955)? 

Operational peflnltions 

The point Is Important enough to warrant an extended eKample, Attempts 
to avoid pursuing the psychollnguistic ramifications of a given test of 
comprehension often result in ^^operational" definitions that defy conven-- 
tional usage and consequently promote misunderstanding in a field already 
rife with ambiguous concepts. In the study by Bormuth, Jtenning^ Garrj and 
Pearson (1970), for example^ "a comprehenilon skill is defined as the ability 
to respond correctly to a question beginning with the letters "'wh' which 
deletes one of the Inrniediate aonstltuents of a syntactic structure*' (p, 35t)* 
Now obviously teachers traditionally ask such who-what-whieh-where-whan-how^ 
why questions In order to direct attention to Important features of the text 
under Scrutiny and to promote '^comprehension skills 3" but when the text is 
available for perusal as it is on a comprehension tests a student with minimal 
syntactic competence can locate the correct answer to such questions without 
necesaarily understanding what the sentence means (Anderson^ 1972 * Carroll ^ 
1972)* The limitations of the wh-ltem as an operational definition of com- 
prehension are evident in the fiolLawi^ nonsense . sentencei 
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The Izilbe gatgotted the dizgleboo . Who gatgotted the dizglebod ? Obviously 
the Izilbe * But what does the sentence mean? And the problem is not exag-* 
gerated by nonsenie santences. Consider the follc^ing statement! Incantatory 
plee_ reverberated In Johann's mlnlscule eerebrum . Students with fl minimal 
syntactic competence and a little tes t-wlsenesa could locate incantatory in 
the text without the vaguest idea of what incantatory or the rest of the 
sentence means* As students become familiar with such test items ^ they 
should be able to locate the right answer In the text long after the reading 
passages exceed thel^r vocabulary knc^ledge* VeiTbatim trans format ions of 
sentences in a text, therefore^ have a limited life-span as viable tests of 
comprehension* 

Paraphase questions^ on the other hand, rather than solving the diffi* 
culties inherent in such "transf crnied varbatlm queatiotig,^^- mily ralntroduce 
some of the same theoretical problems that plague standardised comprehension 
test items. "Any change in wording , including substitution of synonyrnSj 
usually alters [the] meaning" of the original t^t (johnaonj 1975, p. 429; 
Alston, 1964; Lyons, 1968; Quine, 1960| Smith, 1975, p, 104) | vocabulary 
changes, that is, introduce the test-maker's own idiosyncratic interpretation 
of the text--his approximation of what the text "means-'--into the test items, 
Mirth g for exanplei simply does not mean the sane thing as ^lee in the 
"incantatory" sentence above* Even simple active and passive transformations 
engender different understandings (Johnson^ 1975, p, 437 | Anisfeld and 
Kienbort, 1973; Harriot, 1970; Offir, 1973* Smith, 1975, p. 104), (In light 
of the apparent inability to change the wording of the text without "engen- 
dering different understandings^" the very concept of a paraphase^^changing 

3 

the words while maintaining the same meaning- -seems self -contradictory, ") 



The notion of meaning here is ^tended to cover ^*construal" and 
"stylistic" features of an utteranea (cf. Kat^s 1972). 
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In flddltlor,j the Inflated syntax of some of the eompound wh«items 
described by Bormuth et al. (1970)^ tend to make the qtiestlons wore diffleult 
to understand than the original sentence in the te%tj a eommon problem with 
scandardl^ed tests. For ^anipla, from the sentence, He (the boy) fractured 
hlg arm, the questions, Who was it who fractured his arm? and Who was it who 
broke hiB^ arm? are derived (p, 352). (It is also worth noting that, in 
Anderaon-s [1972] opinion, only correct responses to paraphase questions 
ftinong wh-i terns can be adduced as evidence of comprehension, yet the eKamples 
above, even though they are labeled ''paraphase^" fall to conforifi to Anderson's 
definitions Two statements are paraphases of each other if "1) They have 
no substantive words [.nouns, verbs, modifiers] in comnon and 2) they are 
equivalent in meaning*' [p. 150]. in the paraphase-trans formation quoted 
above, however, only the verb is changed- -£ractured is replaced by broke ^) 

In summary, correct answers to verbatim transformations cannot be cited 
as sufficient evidence for comprehension because it is possible to answer 
Bwh questions correctly without comprehending the sentences upon which they 
are based* The operational definition of comprehension, that is, does not 
"preclude plausible rival interpretations" (Messick, 1975, p, 959)* Para- 
phrase transformations, on the other hand, are subjeet to many of the same 
criticisms that are leveled at standardized comprehension test Items* 

The problem is how to write test items that ara impossible to answer 
(beyond guessing) without apprehending the meaning of the teKC upon which 
the questions are based. The brief critiques of the wh-ltems and standard- 
i^fid comprehGnsion tests in this report should Mke it evident that teat*^ 
makers are caught on the horns of a dilemmai If they avoid iinposing 
idloayncratic meanings on the tmt by writing test items based on minimal 
ttansfomations of the text, then it is possible to answer the questions 
wichout apprehending the meaning of the text* On the other hand, if test* 
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makers change the wording of the text in any way in the test items in order 
to force the student to. interpret the text, then they impose idioiyncratic 
interpretations on the t^t without randomly sampling all possible inter- 
pretations of it and Introduce unnecessary difficulties in the syntax and 
vocabulary of the test items. How then Is the meaning which students are 
to apprehend to be identified without prejudiciug it? Further^ is it pos- 
iible to conceptualize meaning without getting bogged d^n in the "intermina- 
ble controversies, , .about what kind of thing meaning is^' (Katij 1972, p. 1)? 
M eaning 

In light ofi the criticism that both standardised comprehension tests and wh^ 
itms (paraphrase Mansf orraations) taposa Idio^ncratlc inSeEpretations on 
the t^tj it appears to be crucial for test^makers to identify, insofar as 
possible^ the relationship between meaning and the orthography on the printed 
page rather than to speculate on the absolute nature of meaning, since such 
speculations inevitably collapse into philosophical quibbles* For the limited 
purposes of this discussion^ the relationships between meaning and the text 
are reduced to three simplified poisibilitlesi (1) Meaning is derived from 
the text; (2) meaning is imposed upon the text; or (3) some combination of 
(1) and (2). 

: Deri^titf Ifteanlng flrmi the. teKt# The first possibility-- that 

meaning is derived frcnn the text«».taplleB that meaning is 1 the text^ oTf 

more exactly, that meaning is in"language" and rep re sen ted- rather 

4 

accurately by the orthographic system on the printed page* Thus, a 
transf oanational grmnariaru might, contendy thatoiith mganihg^^bf ^ a discoui?se^ 

4 

Phonological rulea may be bypassed in the interpretation of written diS'- 
dotiifie (Veneikyj 1967)* Chomsky and Halle (1968) also point out that meaning 
is more directly represented in the orthography on the page than it is in the 
phonological component of language <e.g*, sane ^ sani ty) . "There Is an essen- 
tially arbitrary relationship between sound and meaning so that properties of 
phonetic shape do not predict properties of propositional form and vice versa" 
(Kati, 1972, p. 367)* 

33 

2-5 



is a result of the "granmatical and semantic relations which obtain within 



and among the sentences of the diacourse" (Katz and Fodor, 1967, p* 172)« 
InKata's (1972) somantlG theorj-j 

r n 5 
the semantic component of a granmar contain J a dictionary 

that formally specifies the senses of every syntactically 
atomic constituent In the language. It [i.e., the semantic 
compottent^ must also prescribe rules for obtaining repre^ 
sentations o£ the senses of syntactically complex con- 
stituents ^ which are formed from representations of the 
senses of their atomic constituents in the dictionary* 
The dictionary provides the finite basis and the rules 
provide the machinery for projection onto the Infinite 
range [of the possible combinations of the senses of 
the laical Items], (Kata^ 1972 ^ p. 33) The idea 
underlying this conception is that the logical form of 
a sentence la Identical with its meaning as determined 
composltlonally from the senses of its lexical Items 
and the graTmnatlcal relations between its syntactic 
constituents, (p. kkIv) 

In this "compositional" account of meaning ^ the semantic component of the 

granmar 

operate[B] eKcluslvely on the underlying phrase markers 
in the description of a sentence*.,. Semantic interpretation 
proceeds 5 firsts by an assignment of lexical readings from 
the dictionary to the atomic constituents of a sentence and, 
then^ by an assignment of derived readings to each syntactl-- 
cally complex constituent by the operation of the projection 
rule upon the readings of its component parts, (Kat^, 1972 j 
p. 415) 

Thus 5 initial syntactic analysis^-ldentlf ication of underlying phrase markers 
is prior to (1:* the sense of directionality) the interpretation of the deep 
structure (underlying phrase markers) of a aentencei "The syntactic compo- 
nent is the generative source of a graranar. Its output Is the input to both 

6 

the phonological component and the semantic component" (p. 31)* 



5 

This is of course the "ideal" dictionary, not to be confused with that 
tribe of paper dictionaries ^emplified by the Oxford English Dictionary . 
6 

There is some debate among transf oimationai grmsnarlans over the 
interpretation of final derived phrase markers by the semantic component 
(Chomsky, 1970)* 
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Now this is an attraQtiva theory for a test-tnaker because It seens to 



allow for the derivation of meaning from a given text by a finite set of mean- 
ings combined by a mla systm which can be e^licitly statad, thus raising 
the possibility of objectively deriving Bud ipeclfying all possible inte^ra- 
tations of a text. Meaning is therefore to language^ free from the disposi- 
tional I'mitations of atty reader who mi^t encounter language In one of its 
'aftplrical manif estatious* The medium itself is never at fault in any failure 
to encode or decode meaning accurate lys "Each human thought is e^qpressible by 
some sentence of any natural languagap-' and failures to egress or derive 
meanings accurately are not attributable to failures in the expressive 
capaG-ities of language but rather to an individual's lack of skill "in 
e^^loiting the richness of his language" Cp« 19)# 

But this brief outline of Katz's semantic theoty should make it evident 
that such a possibility for objectively deriving meaning from a discourse is 
based upon a conipetance rather than a perfomance models Katz's model is 
erected on the notion of sentence types rather than tokens: 

We based our study of the meaning of sentence types on an 
idealisation that allowed us to focus exclusively on lin-* 
guistic meaning by abstracting away every aspect of language 
that does not reflect pure graxfliatical cpmpatance* We 
observed early in the book that even a complete theory of the 
meaning of sentences and other const ituant types is a far cry 
from a full theo^ of linguistic conmaunication* Cp» 443) 

The test«makerp however^ cannot ignore the comnunicatlve limitations of the 

reader since they affect the response consistency of the test and are, 

therefore^ precisely the point of interest of the testHm^er (and the teacher)* 

Katz distlT^ulah^^s between a COTpetence and a performance model as follows 

In the theo^ of linguistic competence we seek to state 
the ^st©n of rules that fomaHy represents the ideal 
lir^uistic structures that underile the utterances of 
natural speech. We idealize away from the distortions 
and irregularities characteristic of natural ^eech and 
concern ourselves with the systmilzation of those aspects 
of natural speech that directly reflect the contribution 
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of a speaker's fluency. The theory of linguistic per*, 
formancei on the other hand, seeks Co account for the 
principles that speakers use In actually producing and 
understanding natural speech* Accordingly, the study 
of performance assumes the contribution of competence 
and directs its attention to the manner In which the 
contributions of various psychological £iie£ws«--e«g# , 
memory limitations, attention shifts ^ distractions , 
brain damagej errors^-'lnterplay with linguistic fac- 
tors to produca natural speech, with all Its charac- 
teristic distortions and irregularities* (p, 25) 

Though a performance model "assumes the contribution of competence^" test* 

makers cannot wait for the definitive competence model (which Katz projects 

into the next century)* What is needed is a "working" model of the manner 

in which readers apprehend the meaning of connected discourse^ taking into 

account the dispositional limitations of the reader and the differing 

interpretations of a given t«ct reaultlng from the various verbal and extra* 

verbal cont^ts In which It occurs* The first consequence of shifting from 

a competence to a performance model, however^ is to lose the ability to apec^ 

Ify the meanings to he ^prehended. 

lTOOiAn& meaning upon the text * Psychologists , in marked contrast to 

transformational gransnarlans , usually maintain that meaning Is in the reader 

rather than in the text or "language**- Osgood (1967) , for example, argues 

that 

The meaning which individuals have for the same signs 
will vary with their behaviors toward the objMts rep* 
resented. This is because the composition of the 
mediation process ^ which the meaning of a sign, is 
entirely dependent upon the composition of the total 
behavior occurring while the sign^process is being 
es tabl ished , (p * 1 63) 

Thus J In developing a model of reading ai a constructive language process, 

Smith (1975) locates meaning not in Vlanguage" but in "the underlying thought 

processes of the language user" (p, 84)* According to Smith, it is impossible 

to derive meaning from a text because "there is no one-to-one correspondence 
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between the surfaee and deep structures of language'- (p»« 84), Meaning Is 
first Imposed upon language in the deep structure prior t o syntactic analygts 
oVj for that matter, prior to sampling any of the linguistic clues to the 
meaning Intended by the writer « That is, a reader makes an hypothesli about 
what any given sentence in a discourse means based upon hla expectations 
which are created by the general sociolinguls tic situation in which the dlecourse 
occurs^ the meaning imposed upon the preceding sentences of the discourie, 
etc^ Having made his initial hypothesii, the reader than iamples selectively 
amongst the linguistic clues to meaning in the tect* If the original .hyr^ 
pothesis is verified by the information he perceives in the text, the reader 
moves on to the n^t sentence. If the original hypothesis is not substantiated 
by the information in the text, then the reader either samples more extensively 
or changes his hypothesis about what the sentence means and samples again. 

Now such a notion of meaning that is initially separate and distinct 
from the linguistic clues to meaning in language certainly confronts the 
full dispositional limitations of the reader and the various contextual 
features in which the discourse occurs, but it is Impossible for the test- 
maker to identify '*the meaning" to be apprehended ^ for meaning is essentially 

and ultimately idiosyncratic* That is, ^'comprehension," in Smith's perfor- 

8 

mance model, refers to the assimilation of the information in the text to 
the cognitive structures of the reader. Given the location of meaning in 
the cognitive structures of Individual readers , it follows that "the meaning" 

^"One reason that the surface structure of language does not have a 
one-to-one relation with the underlying deep structures of thought is that 
case relations can be represented in a variety of ways" (Smlthj 1975, p. 103). 

^Perception of parts of the orthography of the text as "information" 
(rachar than "noise") is itself an act of cornprehension CSmith, 1975). 
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of a particular eegmant of printed discourse varies as a function of the 

diaparity between the cognitive structures brought to bear on the discourse. 

Different readerej as any student of literature knows j interpret the same te 

9 

in different ways. Alternate pdssibilltles for interpreting an utterance 
often surprise a reader/listener, which ia only an indication that the de- 
coder's perceptions about the utterance are restricted by his o^n cognitive 

"set.'' Moreover s any reader comes to the same text on diffarent ocoasions 
with varying moods , degrees of ©ctentivenesSj purposes, presuppoaitioRSj 
available knowledge^ etc, all those personal idiosynGracies aschawad by 
a competence model (KatSj 1972, p/ 15), That ia, the "array" of cognitive 
categories that any reader can bring to bear on the information in the tent 
varies with the dispositional limitations of the reader, Therefores the 
interaction beWeen the information In the text and the cognitive structure 
of the reader varies not only between readers but also j&aithln ^Efiadecs* 

Not only is meaning (theoretically) idiosyncratic^ but it may also be 
non-verbal and non-observablep As noted previously^ Smith (1975) contends 
that meaning lies in the thought processes of the language user and that 
there is no one-to-one correspdndence between meaning and the surface 
structure of language* Pursuing the notion further^ Smith (1971) is forced 
to characterise "the meaning of a sentence [as] something global, a 'state 
of mind,' an instantaneous set of relationships established in the cognitive 
organisation" (p, il94Jw^ Meaning is merely the absence of uncertainty (Smith 
1971, 1975). 

Now conceptualizing meaning in terms of the non-verbal, non-observable 
dispositional idiosyncrasies of the reader does not in itself preclude 



The "definitive" reading of a text is a parochial notion, always 
deflated in time* 
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measurement* Fsychologists are long used to measuring diipoilClonal 
phenomena that are non-verbal and non-observable (and that sometimes do not 
exist except in the imaginations of psychologisti) , Brown (1958) ^ for 
^ample^ writes that "a disposition is discovered by creating various con^ ' 
tingencies and observing responses" (p,103)- But such measurement techniques 
are rudimentary and have never proven very successful, even in dealing with 
single words J much less the eomplex Interrelationships among the words of a 
sentence (Miller^ 1965). 

By pursuing the full Implications of the perfonnance model of reading 
as a constructive language process, the test-maker is left in a considerable 
quandary: How can the meaningCs) of a segment of connected discourse to be 
apprehended by the student be Identified if they are Infinitely variable, 
non-verbal J and beyond the capacity of psyehometrlclans to measure? Further, 
if meaning is non-^verbal and there is no one-to-one correspondence between 
meaning and surface structure, then what appears on the printed page Is 
never more than an approximation of the meaning as intended by the writer 
or the meaning apprehended by the reader* The speaker or writer straining 
to say or write what he "really" means aomes tomedlately to mirtdU .Kate's 
ascription of the failure to eKpress a thought accurately to the user and 
not to language is turned around here; since meaning is not ^ language, 
and there can be no efficient transfer of meaning from the simultaneity of 
non-verbal cognitive structure to the temporal realization of meaning in a 
string of morphemes, the failure to express a thought accurately lies finally 
In the medium rather than in the language user. 

If meaning is essentially non-verbal, and Idiosyncratic representations 
of meaning in verbal form are never more than approximations of "the meaning" 
intended by writers or ''the meaning" as apprehended by readers, then it is 
impossible for a test-maker to identify the meanings to be apprehended by 
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the student. Efforts to list all possible interpretations of a text and 
then to eample randomly from that list are fundamentally misconceived. 
Given that it is impossibla to specify all the acceptable meanings of a text, 
la it possible for a test^maker to identify , in general, the "kinds" of 
meanings (e,g., explicit) to be apprehended? Further, is it possible to com- 
bine some of the features of Kats^a competence model, which allows for the 
identification of both specific rneanings and types of meaning, with features 
from Smith's performance model, which allows the test-maker to identify the 
dispositional limitations of the reader and the cont^t within which a text is 
Interpreted? Finally ^ is it possible to specify the "level" or "degree" of 
comprehension (e,g., literal) indicated by a particular response type? 

EKpllclt meaning * Teachers often identify meanings as e^lioit or 
impllQit, literal or ii^erentlalp etc. If such distinctions are viable, 
then it is possible to specify the kinds of meanings to be apprehended at 
a given "level" of comprehension* For eKanple^ literal ccmiprehenslon can 
be def ined--that lS| located in relation to other "levels" of oomprehension 
on one side and in relation to non-comprehension^ perhaps 'tiere verbaliga-* 
tion" or "recognition," on the other— as the apprehensidn of the e^licit 
meaningC s) of connected discourse. The preceding discussion should make 
it evident j however^ that there is no "e^^liclt" meaning ^ the text even 
thou^ people ^e^ (metaphorically) of ^at the te^ct "es^llcitly sayi#" 
Clearly the text does not "say" ai^thingj all meaning is in^lied or inferred 
or derived from or imposed upon the linguistic clues to meaningk'tn the teKt# 

The ^pllclt /implicit dichotomy in meaning seems to be founded upon 
the distinction between denotative and conno^tative meanings. According to 
Webster's New Collegiate Dictionary (1974), denotation refers to the "direct^ 
specific meaning" of a word or what is commonly called its referential aspect. 
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The preceding discussion, however, casts conaiderable S^^bt on the notion 
that words, much less sentences, have any "direct, ap%tft%c meanings*' that 
can be represented in the surface structure of languai^* ''Denotation'? and 
its companion concept, "explicit meaning," are rooted s failure to dis- 
tinguish between reference, usually attributed to wor4# Isolationj and ; 
meaning, which always accrues to words In comply Int^^^^ciona with other 
verbal and non-verbal experiences. Even If the refer %0fc a word Is Identic 
fied as a "psychological entity" (Johnson, 1975 , p, fyZ$% thus blurring the 
distinction between denotation and connotation, there jS still no ona-.to^Qne 
correspondence between the referential aasociatlona it^ fcfte brain/mind of the 

reader and the orthography oni.thei printed page* The; QjU% to a^ttf antng 4.n 

10 

orthography are minimal --simple temporal sequences ^#6^eienting the compleK 
simultaneity of cognitive structures. Moreover, "denQC^Pton" seama to result 
from a habit of analyzing words in isolation (as If wq^4^ tver existed in 
"isolation'') and leads to the false assumption that th# i^sanitig of a sentence 
is the sum of the meaning of its parts (Miller, 1965), ''4 speaker's ability^ 
to understand any sentence depends in part on his kno^^inS the meaningB of ti^ 
component morphemes" (Katz, 1972, p. 35% but a "morph%0%'' Is quite a different 
notion from- a "word," \diich may only be an artifact o? tt\0 or^ographia system 
(Smith, 1975). Besides, "the same sat of morphemes Q^t%';^m^n dlflerant things 
when put in different syntactic arrangements" (Katig^ 19^^^ p* 35)s ©.g*^ 
Philbert is munching on a crawdad / A crawdad Is munch iaa On ghJlbert. 

Critics, psychologists, and linguists Bawrlong trf^tghed against ^ 
treating words as entities whose meaning could be Isol^fcM from the dynamica 
of the contexts in which they occur. I, A, Richards (J936/1965)^ in t^at 

10 

As noted previously, however, meaning ma^ be repr^^^i^ted mpre clearly 
in orthography than phonetics. ? - ^ 
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amounts to a preotirsor of the "eurcent" model of redding as a constructive 

language proeess or a "p^cholingulstie ^eislrig gone" (GooAnani 1970), 

crltlQizee the attmpt to t^e 

the sansei of an author's words to be things we know 
before we read hiisif fl^ed factors wlth^lc^ ha has to 
build up the meaning of his sentences as a inosalQ Is put 
toge^er of discrete Independent tesserae* In ste^d^ 
are resultants idiich we arrive at only thTOU^ the Inter- 
play of the interpretative possibilities of the ^ole 
.utterance* In brief, we have to guess them and we guess 
raueh better when we realise we are guessing^ and watch out 
for indicatlonSf than ^an we thli^ we know. (p# 55) 

Brown (1958) also contends that j 

an atta^t to understand the meaning of a single linguistic 
form In isolation from the total language pTOcess would be 
rather lUce trying to understand a single bid In isolation 
from a g^e of bridge* The meaning of a form. Its total con- 
ventional usage. Involves the full language game. (p. 106) 

Chafe (1972) pushes the interrelatedness of the aoraponent parts of speech 

even furthers 'The point Is that we do not use only part of i&Aat we know 

when we say something, we use all of it, and there is no way to divide 

knowledge that Is linguistically relevant from knowledge that is not" 

(p* 67)* An analysis of the particular senses of the meaningful units of 

discourse mushrooms quickly Into a theo^ of knowledge* 

Hollstle meaning^ It has been the contention of students of language 

aver since Aristotle that meaning is holistic and that the sentence carries 

th^ primary burden of meaning in discourse* Teachers, for instance, make di 

tlnctlons between "reading the line, reading between the llnes^ and reading 

beyoi^ the lines^" ^ich sbctis to be a more viable categorical Bohmae than 

the denotatlve/connotative di^otomy simply because It deals with whole 

sentences rather than words in isolation* Reading the line, reading between 

the llneSp *and reading beyond the lines suggest that there Is an e^qjanding 

conteKt— intrasentential, intersententlal, and extrasententlalp-*-wlthln which 

the Information on the printed page can be inte^reted or| from the point of 
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view of Snlth's perfomanee modeli that there is an Increasingly extansiva 
set of aognitlva catagories to ^ich the irfomation on the printed page can 
be asslrallated* " If the iontenoe is identified ai the prlinai^ v^icle for 
eonvaying meaning in written dieaouree, there semB to be some posiibllity 
of Identifying the kinds of meanings to be apprdiended, that is , the Identity 
and eKtensivenese of aonteKtual oonstraint on the clues to meaning in and 
beyond the text and the identity arrf eKtensiveness of the QOgnltive structures 
to i^ich that holistie iitfomation unit in the teKt has to be assimilaCeda 
Thus the key to a sjrnthesls of the ^eeificlty and objectivity of Katz's 
competence model with the ability to aeeount for the diipoeitional llmltatlonB 
of the readar made possible by amlth's performaTOe model is Kats's assertion 
that "the «npirlcal existence of a natural language lies in tiie linguiatia 
rules internalized by its spe^ers" (p» 15) and anith's (1975) notion that 
"language is [always] anbedded in meaning" (p* 105)» ^ 

For it is obvious thatj in spite of the idlo^ncratiCs non-verbal nature 
of meaning^ spe^er/OTiters and listener/readers do in general come to some 
agreement about the meaning(s) that each of thmn. apprehends in a given meisage 
as indicated by their response behavior to the message. (Gross mlsunder« 
standings are usually due to egregious errors in encoding the message--* i.e«, 
misapplications of the shared psycho linguistic rules syitera"Or a mis^pre- 
henslon of the context within tAlch the message is embedded— l«e« ^ a mis- 
application of the shared sociollngulstlc rules ^stemb) Though surface 
structures may only be approxMiatlons of the deep structures of language or 
the "abyssal" stractures from ^ich meanings may be generated^ a well written 
text clearly allows for some general agreoiient about ^at the text means else 
books would not have become as pervasive as they have in their brief 



This Is parallel with Johnson's (1975) notion that the ^keanlngfulness" 
of a word Is detemlned by "the extenslveness of the network of referential 
assoalations" (p. 427). 
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association with language* 

What Is 80 rtmarkabie about language comprehension Is that people do 
undarstaiid eaeh other, that the apprehension of the multiplicity of meaning 
inherent In any relatlonihip between utterance and decoder is in practice 
such a rare event that it is more often a source of amusement (Smithy 1975, 
105) tham.dimay* Indeed^ those people who develop skills in mining the 
inherent multiplicity of meaning in surface structure are more often con- 
sidered verbal "artiits" than malaprops, 

Cproion alitv of meaning, ' ^at is the source of the apparent commonality 
of meaning that can be apprehe^ed in well written texts? It Is interesting 
to Twte that disputes about ^Aiat a given text %eans** are usually referred 
back to x^at the text (metaphorlaally) "says" or, more ipeciflaally, to the 
orthographic features on the printed page. Kat^i (1972) attributes this ccm- 
monality of meaning derived from a teKt to the regularity of lat^uagei 

if the way in which the speaker finds the words with 
which to express his thoughti^=l^anot,..at leastllnppart, 
the same way that his hearer recovers the thought from 
the articulated words j the fact that different speakers 
of the same language can freely fflcchange positions as 
speaker and hearer, always associating the saine thought 

with the same sentence^ would be IncdmpMheniible. 
Therefore j the basic question to ask is what are the 
conroon principles for encoding and decoding, (p« v24) 

Smith (1975), on the other hand, following th^ generative seinantlclsts , 

goes beyond language to th^ contingent circumstances in which an utterance 

occurs to account for coranonallty of meaning! 

The meaning of an utterance Involves^ pueh more ' thaii 
the words spoken* it depends on the entire altuatlonj 
verbal and non-verbal, In which the utterance Is made**** 
Language is embedded in meaning, and meaning li always 
limited by the prior purpose and understandings of 
both speaker and listener^ or writer and reader, (p. 1^5) 

Soclollnguists, for eKample, have contributed greatly to understanding how 

little linguistic information it takes to convey comply meanings in care- 
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fully defined social eituations* In a like manner, Freedle and Carroll 
(197^ aiso eontenfl that 

Understanding languaga nearly always Involves not only 
comprehsnding the words and grammatical itructurea of a 
message as linguistic symbolic but also taking account of 
those knwledgea^ facts , or Ideas that underlie the message 
but are not ^pliclty built into It**.. Much of the semantic 
content el discourse is not to be found In the spoken or 
printed words themselves ^ but in the prior ikn0wle4ge that 
the produce^' of a message assumes the hearer or reader to 
have* (p* 360) 

These attempts to account for coranon Interpretations of the same text 
reflect (at least) three separate notions of the relationship between meaning 
and sentences In the tGKt, (1) Compositional meaning! The meaning of a 
sentence (type) is determined by the meaning of Its constituent parts and 
the granmatical interrelationships among them* Such a notion accounts for 
synonymys paraphrase^ ete^ but is insensitive to context and the dispositional 
limitations of the reader* (2) Cont^tual meaningi The meaning of a sentence 
Is determined by the interrelationships among the compositional meanlng(s) of 
a sentence type and the cont^t in which it occurs as a token* ("The upper 
limit of semantic Interpretation In a gramnar concerned with conventional or 
llngulstiQ meaning [i- e. , compositional meaning] Is the starting point for a 
theory of cont^tual ooristrual" [katz, 1972, p* 445]e) Such a notion still 
accounts for synon^rmyi parapbrasep etc^ assuming that cont^t can be specified, 
but Is Insensitive to the dispositional limitations of the reader. (3) Dispo- 
sitional meaning: The meaning of a sentence token is determined by the inter- 
relationships among the compositional meaningCs) of the sentence type^ the 
specific content In which the sentence type occurs as a token^ and the dis- 
positional limitations of the reader. "Dispositional meaning" amounts to an 

See Bernstein (1959)- 
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internaliaation oi both compoeitional and conti^tual meaning, Thtre la no 
other way to account for the dispositional limitations of the reader in a 
performance mc^el. In a performance model, meaning is a fOTotlon of the 
interaction between the' features of the text and the context as parceived by 
the reader. Meaning i^ ^ the reader^ and what can be observed In verbal or 
non-verbal response to the text is only an indication of the meaning appre- 
hended by the reader. Such a notion will account for some eoEmionality of 
meaning apprehended by readers as indicated by response consistency to the 
t^t, but will not account for the kind of specificity of meaning implied 
by "synonymy," ^^araphrase»| eto^ since an eKtra-linguistic account of mean- 
ing which is peculiar to the reader is interacting with compositional and 
conteKtual meaning, CoOTnonality of meaning Is ultimately attributable to 
similarities in cognitive structures among readers. 

Note that compositional meaning is integral to all three accounts of 
meaning above, but neither cont^tual nor dispositional meaning is integral 
to compositional meaning (unless the latter is considered an eKpression of 
the dispositional capacities of the reader). Note further that those aspects 
of language which may be genetically coded-^e.g,, a tendenoy ^ong natural 
languages toward similar syntactic structures (Chomsky, 1968; Lennebergj 1967)- 
lie also within the compositional account of meaning. It ia tempting to 
^plain all conmonallty of meaning as compositional; indeed, Kati does so, 
using such te^sas ^'literal," "linguistic," "conventional," and "composi^ 
tlonal," Interchangeably, Hence, literal oomprehenilan could be defined^ 
the apprehenaion of the compositional (i.e., literal, lingutatlc, or con« 
ventlonal) meaning of the discourse, and compositional meaning could be 
identified quite accurately as "the granmatlcal and semantic relations which 
obtain within and among the sentences of the discourse" (Katg and Fodor, 1967, 
p* 172% Those ot^er ^Ueveli" of eoB^rehenslon-*-iTCading between and beyond 
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the lines'—which always lead to increasing diVj^^tty of interpretation could 
then be distinguished quite precisely from literal comprehension. 

But I ai noted previously, any conceptualisation of comprehension that 
does not account for the diipoiitlonal limitations of the reader has a limited 
utility for test*^makers (and teaohers)* Comprehension is the apprehension of 
meaning^ and "appreheniion" demands an account of the dispositional limitations 
of the reader* Internalizing syntactic and semantic competencies in the 
cognitive structures of the reader does not solve the problem either* No 
act of apprehension of the meanlng(s) of a sentence ever occurs free of 
contextual contingencies. Meaning is always embedded in meaning (Smith, 1975 , 
p« 105)# Any "level" of comprehension , therefore^ involves compositional and 
cont^tual meaning in dynamic Interplay with the dispositional limitations of 
the reader* The testing situation offers a unique opportunity to identify 
and control the Interactions between those three aspects of meaning. 
Measuring Ccmprehensipn 

Any attempt to measure a student's apprehension of the meaning of written 
discourse Introduces two additional factors--the item type and tha testing 
sltuation^ — into an already comply cognitive process. Measttrement of . a 
process nearly always disrupts the process to some extent, and the process 
reflected by the measurement procedure is partly peculiar to that procedure* 
This is certainly true of the measurement of comprehension* The meaning 
apprehended by a student on a reading comprehension test is a function of the 
Interaction between the text^ the Item type, the testing situation^ and the 
student. The failure to identify and control interacting features of the test 
inevitably results in rival interpretations of response Qonslitenciei to the , 
test. It was argued In precading sections of this proposal, for instance, 
that correct responses to Items on standardlEed comprehension tests were not 
evidence of comprehansion of the passagea in question because the test items 
were not passage dependent, 1, e*^ the interactions between item type and t^t 
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were not defined and eontrolled. It was alio argued that correct re- • 
eponiaa to verbatim tranef ormatlons of ientfinces in the teKt were no evidence 
for comprehension because the carefully contffollsd interaction between t^t 
and Itan type deluded meaningful aspecta of the dlecourse; i*e*, the procesiing 
of t^t was primarily syntactic rather than semantic, Plnallyi it was argued 
that psr^hras6 transfomatlons of sentemes dn'the te^ test Itans on 

standardized comprehension tests introduced the test-writer's ewn idiosyncratic 
interpretation of the text into the test items and often made the test items 
more difficult to comprehend than the tect itself. Again^ the problem was a 
failure to define and control the interaction between t^t and test item* 

A correct responae to a particular item type can be accepted as widence 
of comprehension of the t^t upon which the item is based .only if it can be 
denonstrated that the correct response is impossible (beyond chance) without 
apprehending "the gra™atlcal and semantic relationihips which obtain within 
and among the sentences" of the text. Passage dependency ^ in other words, is 
the first demand to make of any it^ type* If the item type is not passage 
dependent^ then there is no further possibility of defining the interaction 
between test item and text. Indeed, there may be none* The test item must 
bear a specifiable relationship to both the syntactic and s^antlc features 
of the t^t; in addition^ the extensiveness of that interaction-- e* g* ^ intra- 
sentential J intersentential ^ and e3Ctrasententlal--must be identified and con- 
trolled before the test can be labelled as to the "Iwel" or "degree" of 
comprehension it assesses (e.g.^ "literal" comprehension). This latter con- 
straint on test construction amounts to a specification of the context within 
which the Infomation In the^ text is to be interpttted (asg»V does the teem 
type danand Infomation other than Vthm gramnatlcal and Sfflftantlc relations 
that exist within and anong the sentences of the discouraep" and| if so, 
^ere does this infomation come frcwQ, ^o is expected to have access to it^ 
and ^at skille and processes are Involved in Integrating that extra-textual 
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inforinat ion with the text?). , . 

Moreover, any interaction between student, tee t ^situation, item type, 

and tmt involves assumptions about requisite competeAcies on the part of 
the atudent which tnust be matched properly by the test tasks, otherwise 
response consistencies are again difficult to interpret. TeKts vary greatly 
in syntactic complexity, for example. How is the syntaetic complexity of 
the teHt to be aeeertained and controlled in delation to the syntactic abil- 
ities of the students taking the test? Is a student to be declared Incapable 
of relating inter-teKtual and eKtra-textual information meaningfully when 
the text itself already exceeds his ability to apprehend the gramnatical 
relations that exist within and among the sentences of the text? What level 
of linguistic competence is assumed by the test? Hw are general linguistic 
and Intellectual abilities to be differentiated from those abilities that 
are peculiar to the item type? 

Assuming that the student has the requisite competanDies to perform 
properly on the test, how is the test to be administered so as to eliminate, 
insofar as possible, the non-requlslte competoncles (©ig., phonetic skills) 
from the test scores? How can the test be dealgned and administered to re- 
duce the effect of personality, motivational factors, test^taking skills, etc.? 

Since traditional item types (l,e.j queStlQns based upon the text) make -. 
the Interaction between item type and text so difficult to Identify and con** 
trol, the obvious solution to that problem is to eliminate questions. The 
following two chapters analyze the cloze procedure as a test of comprehension 
without questions. An attempt la made to specify the Interactions betp;een 
text and item types on several variations of the cloge procddure. Chapters 
on validity later in this report attanpt further specifications of the 
interaction between text, item type, testing situation, and student charac- 
teristics. ' J A 
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CHAPTER III 
THE CLOZE PROCEDU^ 

Wilson Taylor introducGd the oloae procedure to the reading field in 
1953 as "a new tool for measuring readability*" Taylor derived the term 
"alo^e" fifom the concept "closure" in Gestalt piychology, reasoning that 
"the human tendenoy to complete a fimlliar but not-qulte-finiihed pattsrn" 
li comparable to supplying miislng words in connected discourse' Cp# 415) 
Though Taylor's analogy with Gastalt concepts was misleading (Rankin, 
1964| Weaver^ i965| Ohitnacht, Weaver, and Kohlerp 1970), most of his 
procadures and conolusions about the clo^e procedure have proven ranarkably 
durable through more than 20 years of cloze research. In addition^ the 
"new tool" that Taylor introduced to measure readability has been extended 
drffinatically in inveitigations of "reading comprehensionp learning, 
infomation, thinking, nisnerous language variables, teaching, aptitude, 
readiness, listening, fleKlbility, and context cues" (Raidcin, 1974, p* 2). 

A ccTOplete bibliography of alozm research would be comprised of ieveral 
hundred items. VJhat follows is a brief, critical review of selected studlcis, 
concentrating on salient features of the close and related theoretical 
issues which are gemane to the analysis of comprehension as discussed in 
the preceding chapter of this report. For more compEehenslve reviews of 
the litorature on the cloio, the reador is reforrod to Rankin (1959, 1965, 
and 1974), Potter <1968), and Fran Ci972)„ 
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Readability 

1 - 

Readability fomulai> Gonventional readability formulas measure a 
saall nmaber of variables su^ as prepositional phrases per 100 words, 
pereent 'liard'' wrds, and average sentence length in ^ecifled segaents of 
a text and than calculate scores x^lch indicate the grade level or levels 
at ^ich students with average reading abilities will be able to comprehend 
the text. The fomulas were derived by analyzing written teKts for if^ich 
grade levels had been eitabll^ed on the baili of pupil perfomance and 
then detenalning by regression analysis the relative wei^tingi of sentence 
length, hard words^ and io forth that would best ^'predict-' the grade level 
or difficulty of the texts* 

Once established, the fomulas were used to predict the grade level of 
other teKts* They give teachers and publishers an estimate of the readability 
or difficulty of written material without actually having students read.lt* 
However, they have shortcomings In that they do not tell how Individual, 
students or groups of students will respond to specific teKts, and they do 
not take full account of compleKitles of fofia and content which may affect 
the comprehension o£ individuals. 

With few eKceptions (notably Bomuth, 1966), readability fomulas 
sample only two or at best three of the many stylistic variables that affect 
readability. The Lorge (1939), Flesch (1948), Dale-Ghall (19^^), and 
Bpache (1953^ 1960) fomulas, for exmple, all count the average number of 
words per sentence but Ignore variations in sentence structure, which can 
radically affect compr^enslblllty* For Instance, mwmbtim the words in 
a sentence would not even affect the score on most readability fomulas. 



See Klare (1974) for a current review of readability fomulas# 
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The Lorge, Dale^Ghall, and Spache fomulag also count the •'hard" words 

in a passage but are insensitive to the difflaulty of words in context. 

GarroXl (1971) has dOTonatrated that the eanpr^eniibllity of words can 

va^ greatly with their graraDaatical functions (e»g«i' COTpare rank ae a noun 

or verb to vsLtk as an adjective)* Moreover^ styliitlc elenenti vary in 

difficulty for students at different stages of lar^age develo^ent^ almost 

necessitating special fomulas for eack level of reading ability (tolth 

and Dechant, 1961), "Until the advent of the clo^e test there was no 

practical way to measure the comprehension difficulties of individual words 

and sentences*' (Bormuth, 1966, p« 85)# 

The cloze procedure * A standard clo^e test of the readability of 

printed discourse is constmcted in six easy steps (Taylorf 1953) s (1) 

2 

Delete every nth word (usually every fifth or more words) Irrespective 

of part of speach or meanlngi (2) replace every missing word with a bla^ 

of standard sl^ei (3) assign' the 'Mutilated" passage to a representative 

sample of the students in questloni (4) ask the students to fill In the 

missing words by guesslngj from the rOTaining context, what the missing 

3 

words might have beeni (5} total the exact-word replacements and calculate 
a readability score— the percentage of correct responses— on the basis of 
the total nisnber of deletions! (6) cmpare the student J scores from different 
passages and rank the passagea in order of difficulty. 

If the content surrounding a missing word is reduced below sIk to ten 
words, it becomes very difficult to replace the missing word (^orn^ 
Bubenstein, and Sterling, 1959| MacGlnitie, 1961). 

3 - 
Minor aisspellings are accepted. Scoring syno^ms, on the other harul, 

has little effect on test reliability or validity | instead, it introduces 

subjectivity, difficulty, and esqpense into the close procedure CTaylor, 1956| 

Bo mu th , 19 67 a) • 
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An estimate of the difficulty of every word or sentence in a passage 
can be obtained by constructing five foEras of the test, deleting every 
fifth word, beginning alternately with the £ irstp second, or third wrd, 
and so on, until eveiy TOrd in the passage has been deleted in one or 
another test form (Taylor, 1956| Bomuth, 1964). Different foms of the 
test are then randomly assigned to representative smples of the students 
and analysis made of perfomance on different forms. 

Besides the ability to estimate the difficulty of every msrd and every 
sentence in a passage, *^he cloze procedure has several other advantagei 
over readability fomulas* First of all, a clo^e test actually 'Measures'* 
rather than predicts the readability of a passage* More specifically, the 
close procedure counts the niflnber of successful, exact-TOrd replacments 
of missing words In a passage and then es^resses this mmber as a percent- 
age of the total missing words. The percentage of correct responses 
indicates "the extent of likeness between the language patterns used by the 
writer to express what he meant and those possibly different patterns which 
represent readers' guesses at ^at they thir^ the writer meant" (Taylor, 
1953, p« 417)» Thus cloae scores represent "the proportion of predictable 
material that the passage contains" (ColOTan and Miller, 1968, p. 371) for 
the Students in question. A student's ability to guess a significant pro- 
portion of the language used in a particular text indicates a sufficient 
acquaintance with the stylistic variables and the content of the text to 
be able to comprehend It with somB specifiable degree of proficiency. Any 
teacher or subject coordinator, using the clo^e procedure, can detemlne 
the appropriateness of a given text for a particular group of student s. 

Secondly, clo^e scores reflect many more llr^ulstic variables, including 
syntactic complexity (Ruddell, 1964| Siinons, 1970| Steiaan 111, 1971) and various 
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stylistic devices (Bormuth and MacDonald, 1965), than readability formulas^ 
Taylor (1953), £or es^mple, compared the difficulty ratings of three passages 
as estiinated by the cloze procedure and the Flesch and Dale-Chall readability 
formulas* The passages were ranked In the same order of difficulty by all 
three methods, but the cloze scores showed far more sensitivity to stylistic 
variables, Whereas the Flesch and Dale-Chall formulas predicted that a 
passage by Gertrude Stein would be appropriate for fourth or fifth grade 
students, the clo^e procedure gave it a higher rating more consistent with 
its obvious difficulty. 

In studying the validity of the cloze as an estimate of readability , 
Bormuth (1962) compared cloze scores on nine passages with multiple-choice 
and sentence- completion comprehension scores on the same passages* The 
correlation was •92, 

Coleman and Miller (1968) used a modified cloze procedure to calibrate 
36 passages for difficulty* Assuming that the amount of *^new information" 
that can be gained from a passage is a function of its difficulty or the 
amount of predictable verbal material in a passage^ they asked students to 
guess each successive word In the passages. If the student guessed the wrong 
word, he was corrected* Each student went through each passage twicej and 
the ^'information gained'^ was the difference between the two scores. Thus 
the final score reflected the difficulty of a passage or ^^the efficiency 
with which a passage transmits new information*' (p# 369). 

Aquino (1969), using the same 36 passages^ compared Coleman and Miller* s 
difficulty ratings to results from two other measures of "readability""Word« 
for*-word recall and judgements of difficulty* The 36 passages were ranked 
in the same order of difficulty by all three methods. 

The cloze procedure^ however, can be a bit more, cumbersome than read^- 
ability formulas. For eKample, Dale-Chall readability scores are usually 
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calculated with pasBages of at least lOO words in length, Usitii the cloze 
procedure, hovever, a readiiig teacher with 30 studants would need a passage 
of at least 750 words to get a reliable astlTnate of difEiculty. The re- 
liabilities of cloze estimates of readability vary as a furiction o£ the 
number of students and the mtnber of deletions (Bomuthy 1965). 'Where the 
passage is very short (contatning fevrer than 30. . *[deletioiis3) ? it is doubt- 
ful that Individual scores are sufflclentLy reliable to perrnit an accurate 
judgemant of how well a given individual understood the passage" (BorniUthj 
i967a^ p, 16)s Increasing deletions^ and, consequentLy^ passage length, 
tends to reduce error Tnore effectively than increasing the nuiaber of students 
As few as AO deletions ^ or a passage of 200 words in lengthy howeverj could 
be used with 150 students (Bormuthj 1965)* 

Merely ranking passages for difCiculty in relation to each other does 
not provide teachers with sufficient tnforination about readability* In 
1971j Bormuth attempted to develop "standards of readability'' bo that any 
cloze score on any given passage could be interpreted independently of other 
passages. He coinpared cloze scores to measures of ''inforniation gain" (the 
difference between pre- and post- test scores) assessed by multiple^* choice and 
sentence- compL at ion tests* Bormuth interpreted cloze readability scores as 
followsi 



Scores below 35% indicate an inability to gain "information'^ from the 
passage* Scores between 35% and 49% indicate an ability to gain information 
with instructional assistance* Scorei beyond 50% represent an ability to 
gain information from texts independently* 



CloEe Scores 



Reading Ljevel 



0% to 3 4% 
35% to 49% 
50% and above 



Frustration Level 
Ins tructionaL Laval 
Independence Level 



55 

3-6 



In ShDrti more than twenty years of research has established the 
vnlldtty^ r-eliablllty, and utility of the alozB procedure as a tool for 
estlniatlng readability • BoCTnuth's work on standards of reliability^ however, 
should make It evident that there Is only a teriuous dlsttnction between the 
cloze as a test of readability and the cloze as a test of compreherision- 
Bormuth' s study on readability^ as a matter of factj Is ofteTi cited In 
discussioiis of the validity of the close as a test of reading comprehension^ 
and his '^standards of readability^' are used to interpret cLo^e coinprehensloTi 
scores (e.g., Eansen axid Hessej 197 4). A cIdzs readability score tells a 
teacher something about the characteristics of the te^ct in relation to the 
reading conipetency of the students, with the emphasis^ as the term "read- 
ability'' inipLies, on the text* As a test of compreherision^ the cloze pro- 
cedure generally reinalns identical 5 but the interpretation slilfts from 
characteristics of the text to characteristics of the student* 

the tenuous distinction betweexi readability (conipreheiislbility) and 
cDmprehensionj ho^ever^ Is not pecullat to the cloae| rather it is inherent 
in the concepts themselves. Readability formulas^ for exatnplej are usually 
validated with standardised reading lesions in comprehenslDn as a criterion. 
In the Lorge, Flesch^ and Dale-Chall formulas3 the criterion is the Standard 
Test Lessons in Beading (McCall and Crabbsj iQzS, 1950^ 1961). 

When the cloze procedure, rather than a standard, inultiple- choice test 
of coTnprehension^ is used as a criterlons readability formalas ''consistently 
yield higher predictive validity coef flcients'V (Klare, 1974, This 
implies that the cloae procedure has more In cotmnon with readability formulas 
than standardised comprehension measures- BoMith <197l), 0x1 the other hand, 
suggests that aloze tests ineasure an even broader ramge of skills than tradl- 
tionalj mu 1 ti pi e- choice comprehension tests. But that may be a disadvantage. 
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Scores on traclitiotiaL coiriprehension tests already reflcat such a broad 
range of psychol inguistic skills that it Is nearly impossible to specify 
exactly what the tests measura. 
Comprehension 

Sy^ta oti c ^e g » A student attempting to replace rnissing words in 

connected discourse has two basic decisioris to makei (1) He must decide 

which part of speech is appropriate to the syntactic context and (2) which 

particular Voxd within that graiTrriatical category is appropriate to the 

semantic content . The student makes both decisions on the basis of his 

4 

knovjLedge of Che syntactic and seinantic regularity o£ the language* IE 
the sentence ^ich£n the granmiatical CQmpetence of the student^ he has 
enough syntactic cues (i*e, ^ the order in which the morphemes occur) to 
chODSe the apgDtopriate part of speech even thDUgh he may not know what the 
conterit words mean* A nonsense sentence i retaining only the raoTphemes 
(underlined) neceasairy to parse the sentencei makes the distitictlon between 

syntactic and semntic decisions clean ' 'The _ lea scuokke4 

tconly down tfa eesbii rgag* The missing word obi^ioiasly perfoms an adjec- 
tival function in. che sentence. Thus a student faced with a gap in the 
following senCence^^'^The car cateened madly dov^Ti the canyon 

road''--^has enougi syntactic cues to know that the missing word again has to 
behave like an adjective. In granmatlcally weiL fornied English sentences^ 
that is, deteMixieE's like ''the'^ are usually^ followed by nouns^ adjectives^ 

^The assumpticpns are that the original sentence is grairanatically well 
forriLedj tha£ the words are part of the lexicon of the Languagej and that the 
particular coinbiriacioii of words '*makes sense^^ to other members of the speech 
conmiinity* 

5 

Note ill e5<ception in this sentences ^^the'* is followed by an auKiltary 
%rerb^ ''are,'' 
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or adverbs <e,gt , ''the happily sotised niam*') while verbs aw usually preceti^^ 
by noun phrases (nouTis or pronouas, and taodlElers). ^ 

At the vary least, then^ the cLose pxocedura asiess^j the student^ s 
syntacttD competancei and that compBtenci Is futidanientaL to the cotnprehensiQn 
of any sentancei '^To coiapreliend a sentence, the readar mast understand tl^Q 
underlying strut tutal relatlonshLpSj l^e,, the logical subject atid logical 
object of tlie senteiice" CSamoms^ 1970, >pt 33| WoAot and Gacrett, 1967| 
Fodor, Garrett, and laverj 19581 'Weisberg, 1971 j Smithy 1975)*^ 

Not only Is pyatactlc QompetenGe fundaireTital to coitipTelie^siQni but tI\Q 

apprehensiOTi of structural reLattonshlps tn a Eantenca Is ccTiSlderad part 

of the proves a of comprehension since syntactic and sfemantic pcocasses ar^ 

8 

Intimately t)Dund up mth each oti^ec In language perCocmancea It is 
impossibLes that is, to "assign ifteaning to woxds in a senteTice without 

syntactiG sttucture of the senteace'' (Hilleri 1965^ p* 17)* The appreheniipn 

of tneaning clearly includes gtanmtlcal relationships 1£ 'feeantng'* is conat^ued 

as the total dtspDsltl&n to tnake use of and react to a 
linguistic form* It follows that a readiness to tise 
wctds in accordance with CQni^entlons about the pa^ft^ of 
speech is a part of meaning* However, It is m paxt that g 
can be distinguished from teierence, CBrownp 19SS, P* 118)" 



^See Chomsky (1957^ 1965) £or ati anatysis of synCaGtle strueturas, 

^Thair^ is some eKperlmental evidence "that perceptlonj comprehension^ 
and recall of sentences la intimately coniiected with unaetlyins santencai'^ 
(Finn* 1973)* See studies by Leaneberg (1967)^ Anderson <1973)j and Fodot 
and Bever (j96S), 

^Note that syntactic theory Ca competenca/know'ledge theory) ^as 
orlginaily developed irlthout recourse to semantic theory ^ but the dlscusstQ^ 
here concetns the us^ of i^arlous competencleE- 



^The referentiaL aspect of nieanitig is only peripheral to the appreht^^ 
slon of meariina in connected discourse where ^'the Interanlmation of words'^ 
(Richards* 1936/1967) predornlnatest meaning of an utterance is not ^ lineal 

sum of the fneanlngs of the words that coinprtse it^' (Mlllei^i 1965, IS). 
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Identifying the apprehension □£ the meaning residing in structuraL 
relationships as an esseTitlal act o£ comprehension as such has sev^eral 
adi;antages over txaditlomlp gLobal conceptualizations of compreheiisioni 
(l) It enabLes researchers to distinguish cL early between more rudiinentary 
reading skitlSj like decoding Crecogndtion of Litters as sound and groups 
of letters as words) ^ or woicd know^ledgej which inay be prerequisite to 
comprehension. (2) It also enables researchers to specify the reLationship 
between the conceptualization of conipiehenslOTi arid actual linguistic campo- 
nents « 

There is sonie empirical evidence that cloze scores reflect syntactic 
competence to a greater degxee than traditlotiaL reading comprehension Cesta# 
Simons (1970), for exmple, devised a "Deep Structure Retrieval Test^^ 
(D. S.R.T,)* Students ^ere asked to identify the anomalous sentence among 

threa^ sentences s^^^^t^^ which were paraphrasai of each other. Score^^ ^ 

then correlated with cloze scores and scores on the Metropolitan AchlevOTent 

Test (H.A,T*)« 

The correlations between the DsStEiT* and the Cloze 
Test are slgntf leant and quite larges with more than 
50% of the v-ariance accounted for by the D#Sal,T. The 
relationship between the D* S*R^Ti and the M«A*T, Reading 
is significant but not as great as the Cloze Test* (p» 74) 

Simons concluded that 

Recover ing the deep structure is ati Important aspect 
of reading comprehension* In fact Ss* skill at recover- 
ing the deep structure of sentences is a much more tapor- 
tant aspect o£ reading comprehension skill j as measured 
by a cLoze test^ than I.Q.j word knowledge and word recog- 
nition skil] , <p, 89) 

SeTnantic cues * The close procedure^ on the other hand^ has been 

criticized as a test of cQmprehension on the assuinption that '*clo^e scores 

are probably mora dependerit on detection of grammatical, than of Mtnantic 

cues'' (Carroll^ 1972, 19). As Eainanauskas (1972) points out? however, 
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It IS difficult to separate seinantic and syntactic sources 
of constraint experlineTitally al though thty caii be distin- 
guished conceptually. Brown (1970) for example, wrote that 
syntactic expectancies are guided by prior semantic infor- 
mation, as in the search for a loglcaL sutjeet and predicate, 
(p, 324) 

Mucli more research needs to be done on the relatioTLShlp between cloze scores 

and the syntactic and semantic components o£ languagef It does seem clears 

nonetheless, that grairanaticaL cues only allow the student to pick the appro- 

priate part of speech for a inisslng word in a cLo^e passage. Exact-word 

replaceinents (or synonyms) ^ on the other hands require an apprehension of 

the seniantic cues surroundiTig the missing words* 4 student ^hnust guess 

what the mutilated sentence means as a whole^ then coinplete its pattern to 

fit that whole pattern'' (Taylor^ 1953, p, 4165. 

frjessing missing words In context is not far removed from the actual 

process of reading connected discourse. The studetit^s 

■habits of reading cause him to anticipate wrdsj almost 
automatically, when he is receiving messages* When he 
nees the start of a phrase that looks familiarj he iTraned- 
lately tends to complete it in his own -way even when the _ 
written phrase actually ends differently, ( Taylor , 1953, 
p. 419) 

Gcodrmn (1970) describes such reading habits as a '^psycholinguistic guessing 
gaitie''i 

Efficient reading does not result from precise perception 
and identification of all elements, but from skill in 
selecting the fewest a most productive cues necessary to 
produce guesses ^hlch are right the first tline. The 
ability to anticipate that which has not been se^n, of 
course, is., -vital In readings just as the ability to 
anticipate what has not yet been heard is vital In 
listening* (p* 260) 

Readers can guess missing words in connected discourse not only because 

of the syntactic regularity of the language but also because there is con- 

sldarable semantic redundancy In any utterance, 



60 



3-U 



^*Man coTning'* means the same as "A man Is coming this ^ay 
no%^,'^ The Latter, which Is more like ordinary English^ is 
redundanti it indicates the singular nuinber of the subject 
thiee times (by "a," '^n," and ^*ls'*), the preseTit teiLSe 
tvrlce C"is cotnlng" and ''now'^)^ arid the direction of action 
twice ("coTning" and "this way")* Such repetitions of mean- 
iagf such Internal ties beWeen words^ Tnake it possible to 
repLace 'Hs, "thiS|" '^ay, '* or "now,*' should any of theni 
be missed* (Taylor^ 1953, p* 418) 

Carroll (1966), distinguishing betweexi concepts and vroicds, ascribes the 

semantic redundaricy In '^normal language tests'* to the overlap of the concepts 

(verbal and non-^verbal classes of eKperlence) '-suggested by the words in a 

sentence** (p* 84), 

The recurrence of particular expxeBslons in a speech coimunlty also 
increaies the probabiliCy of certain words occurring In specific aentenceSw 
Taylor (1953), notes, for instance, that "*Plar\3e pass the ^ 
is more often completed by 'salt* than by * sodium chloride* or ^blowtorch'" 
(p* 4l9). The probabilities obviously vary with the situatioTial contexta 
For eMmple, ^'salt'* Tnight occur more often in that sentence at the dinner 
table, but ^'sodium chloride" might be inore frequent in the chemistry lab or 
■'blowtorch'* in the welding shop* Ordinarily, in connficted discourse ^ the 
sentence would be embedded among other senteiices, further defining the seinantlc 
content and constraining the number of 'words that would be appropriate* 

Taylor^ s eMmples, however i are mostly cliche expressions grounded in 

social ^'ritual Ss" Though all "semantic regularity" Is ulttaiately based upon 

10 

shared eKperlence Cvarbal and non*verbal)s the cloze procedure Is no less 
effective when dealing with sentences that a reader has probably never en- 
countered before* Consider, for example, the ''mu til A ted" sentence Introduced 

earlier in this discussion of the clo^e procedure i "The 

car careened madly down the canyon road*'* The reader has enough syntactic 
cues to know that the missing word has to behave like an adjective, but the 

^'Shared experience'* toplles psycho- and socio linguistic sys tarns only 
dljniy understood at present. See Becnstetn (1969). 

61 

3-12 



list of appropriate words in this semantic context excludes many adjectives 
from consideratlOTi (assuming that the rest of the words in the senteTice are 
within the readar's vocabulary and can be related to the reader's non-verbal 
experience)* Cars careening inadly down canyon roads , for Instance 5 are not 
likely to be ^'superclLlous^' even though "superctlious'' can perfonn the func- 
tloTis of an adjective as required by the grmnatlcal conte^ct. Imbedding the 
sentence In a cohesive paragraph would further reduce the number of adjectival 
expressions that would be appropriate In this sentence. 

On the Qthar hand p clo^e tests often coxitaln missing words which are very 
difficult to replace no matter how eKteTtsive the context CFletcher, 1959i 
BornBJth^ 1962)* Who could guess the deleted word in the following sentence ^ 
for example^ without knowing the original te^ct? then took up three planks 

from the flooring of the chamber, and deposited all between the^^^ 

(Poes "The Telltale Heaxt'O- (The missing word is "seantiings* *0 

The occurrence of both easy and difficult restorations does not present 
Insolvable problems for the clo^e procedure* 

A series of about 50 blanks is roughly sufficient to allow^ 
the chances of mech.?nicaHy selecting easy or hard %^ords 
to cancel out and yield a stable score pf the difficulty 
of a passage, or the perfOCTiance of an individual ^ despite 
what specific words the cOunting-out process may delete* 
(Taylor, 1956, p. 

In addition 5 Bormuth (1957a) contends *that very easy and very difficult re- 
storations of deleted words contribute to a testes validity ''In testing sub^ 
jects differing widely In ability" (p. 12), 

Contextual constgaint. There Is liTnlted empirical evidence regarding 
the esctent of conteKtual constraint on cloze deletions- MacGinltie (1961) 
varied the deletion patterns on two prose narrative passages and randomly 
assigned 20 college students '*to each omission set of each passage," ''No 



11 

Deleting every fifth word 50 times would require a passage of at least 
250 t^ords. Most cloie research, following Taylor j is based on passages of 
approxtoately 250 wrds in length^ (Potter , 1968)* 
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statistically sigmficant difference was found in the difficulty of restoring 
omitted words whe^ every 24th, 12th, or 6tli word was omitted, but oinitting 
every 3rd word itiade restoration more difficult^' (p. 125% MacGinitie concluded 
that *^additionai uninterrrupted context beyond five words did not help in 
the restoration of the missing word*' Cpt 127). (Kote that ^cGinltie's 
results are only based on two passageSs both 144 words long, that his subjects 
are cQtlage students, and that the ten word bilateral constraint is only an 
average over the two passages. No attempt vas made to identify contextual 
cLues^^ or to relate the conteKtual constraint oT specific deletions to 
nieaningful units o£ discourse, e#g.y indepenr' nt clauses*) Abornj Rubensteinj 
and Sterling (1959) also found that a context of five to ten words was 
maKijnaLly effective in the replacement of lalssing wordSi Their study, how- 
ever, is based upon isolated sentences rather than connected discourse. 

Taylor (1956) alio reports that every^ fifth-word deletion is '^statistically 

13 

independent'- In cloze tests* 

Other studias of contextual constraint conipared unilateral and bilateral 
constraint (cotitext preceding or following and context surrounding omissions)* 
Weaver ( 1962) discovered ''that a conteKt is Tnost restrictive when a word ii 
embedded within It* Bilateral context seems to iinprove the precision of 
language'' (p* 153)* Indeed, Coleman and Miller (1968) found "that the bi- 
lateral constraint is so great that surprisingXy little inforniation is added 
to it by reading the passage'^ (p* 374), 



^- Ames (1966) attempted to identify contesstual clues, and Rankin and 
Overholser (1959) investigated ''the sensitivity of intemiediate grade pupils 
to cotitextual clues described by Mies^^ (p* 50). 

''It should be noted that close materials for first graders have been 
modified to make it possible for thofn to cope with this type of task*^ 
(Rankin, 1974, p, 6)b Gallant (1965), Cor instancy had to use a three-option^ 
multiple- choice eloze to maintain test reliability in grades 1, 2, and 3* 
Gove (1975) used passages of less than 75 words in length and deleted only 
lexical itemss 
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Reducing the cmimm for interpretatiOTi to £ive or fcen words topLi^s 
that "cLoze scores a^e depen,dent chiefly on what might be called thta 'loc^al 
redundancy* of a pasMga^ the extent to which linguistic mm in the 

teiediate envirotmient (getierally, in the saine seT^tence) of a missing, wotA 
tend to supply it'' (CaCToll^ IStZ^ p* 18)^ Thus tLacGinltle' a evidence sterns 
to run counter to T^^^or's contention (1953) that a student *hnust gliess 
>rhat the senteiice mmm as a whole, then complete its patterTi to fit that 
^hole meaning'' (p* 416)^ Farther, a context iot Interpretation of five to 
t©ti words excludes largest semantic units , ''the major Ideas or concept i 
that run through a dlacourse^' (Carroll, 1972, p» 19), whereas ''it Is typteal 
mtii natural for sentences to be comprehended as part of a larger aema"nti^ 
imW (Doollng, 197:2^ 5&)^ Moreover. compreMndini a sentence In cofi- 
t^Kt is a more cotnplex task than ccmprehending a sentence in isolation''' 
(pii 6Q* Any test of comprehension, therefore^ must get at largea; witi of - ~- 
meaning than five to ten woKd clusters* 

Qualifying his generalisation about conteKtual constraint in the clo^e^ 
tocCinitie (1961) writes that 

Although it seems that constraints be Ween words generally 
decrease "very raptdly with dlstancej this does not mean 
that constraints tiever operate over distances of more thm 
four or five words* AlsOj some constraints, such as knowing 
the topic of the paragraphj inay have a more generalised 
InfLuence that does not decline with decreasing length K>i 
context in an aasily specifiable way* Carroll^ Carton, nA 
Wilds (1959) repp^jt that when a paragraph is broken into 
10-word segments ^Ith the 5th word in ^ach segment omitteds 
restoration Is tmQh lass accurate when the segments are 
presented in random order rather than in their original 
order* tp* 128) 

RaTOnauskas (1972) ^Lso gathered evidence on intrasantentlal constraints ^y 
-ttiSignlng two cloa;© tasks to educable, mentally retarded studenti* Otie task 
presented students with '''selections containing sentences in the natut^al Order 
qE discourse., .[and] the other task Involved materials wherein th^a mtitmcB 



EKLC 



3-15 



64 



order was modified by being :rli'ndQXiily rearranged*' (p* 33S)^% Rmanauskae 
found that '^a significantly glf»aait^r number of correct cloaa responses were 
produced for tnaterial having #*ftMte#nces in the natural order of discourse" 
Cp. 342), Moreover, FllLembwsn (in Potter, 1968), foutid that while ''forrn 
class predictability is more fliipe^ndent upon the liranidlatee gratmatlcal 
enviroinnent. , .verbatim predie^^tabiltty depends upon both this factor and 
remote topical content or sam^Mtc features of the di^couvse^* (p. 23)* Thus 
there seams to be both loglML and empirical evidence fo^ the cloze as a 
measure of both small and semantic units In cann^ated discourse, 

i*e*3 a measure of reading mt^pMhmtiBton, 

Correlation and factor ■jmlyfelc- s.tudies . In additton to this kind 
of logical and piecemeal emptolagil evidence^ many Invaitlgators have studied 
the relationships between clo^# Scores and scores on st««,c^rdized, norm* 
referenced tests of reading- aampwhenslon* -^^ 

displayed In Table 3*1 indlcat^i^^ in general, a substantil^l correlation 

between such scores. MoreoWj. Rankin (1965) notes that^ with few exceptlonSi 

comparisons betwem clmm tests and standardls#4 . 
reading tests have yielded substantial cori^^ilations 
even though the clo« tfests were based upon ^ variety 
of different types ofi :K©adlng materials and wem con- 
structed and adiutnisteTOd In different way;S» ip* 136)^^ 

Since standardi2;ed, norm^rrfterenced tests of reading comprehension are 
biased toward critical readlci,g skills, it»s not aurpKiaing that cloze scores, 
with a significant syntactic lactw, correlate substantially rather than 
highly with scores on standardised comprehension tests^, mB Indicated in Table 
3#1# Nor is it surprising to dlwover **thrat oorrelatiom$ with cloza scores are 

^4 — 

Such variations, hovimm, make it difficult to corapwe results from 
different studies* Construct validation of the cIoe^ as a test of compre- 
hension becomes even more diffteult* 

65 



3-16 



frequently higher for vocabulary measures than £or compreliensian measures" 

(Potter, 1968, p. 5), as ilLustrsmd by Rar i (1957) and Fletcher^ S (1959) 

studies in Table 3*1. With a sigtiificant ^ncactie factor and a pVaponderance 

of IntrasentaiitlaL ^ constraint^ qIqzb scores should correlate highly with 

measures of comprehension between the polar extrenies represented by pre- 

comprehension vocabulary measures and tests of comprehension biased toward 

15 

critical reading skills* 

In any events the kinds of cowelations represented in Table 3*1 are 
often accepted as indicative of thm validity of the cloze proceduy^^ as a 
test of ^'general comprehension^ it is usually called in the literatures 

QVi Tnore specif icallyj the abilitey to comprehend. Moreover^ as a test of 
cornprehension ability, the cloze lias few of the liabilities o£ pt«iTiidmrdised| 
norrn-referenced tests. Test coMfcmiction in the cloze proceduraji £o:r 
instancej requires no particular ©^pertise in language ora tes ciytg tod is 
sufficiently objective (even '^ach^nical to allow for the coift^fc:TOation of 
parallel test fontis for periodic testing* More importantly, thi&W no 
questions in the cloze procedure to introduce extraneous dif £ ioulti'ss and 
processes* Cloze tests arCj hom^^rs cumbersome to grade since tha^y have 
to be scored by hand* 

There Is some conflicting evidence regarding the cloze pro'&^'flute as a 
test of ability in reading comprehension* Weaver and Kingston^^ (1963) study^ 
for example J as Indicated In Tabl® J«l^is an eKception to' the gmis^al tandency 
toward substantial correlations between aloze scores and scorei oru standard-*- 
ized comprehension tests* Aftet ©^^anining "the relationships o£ ^lom tests 
to standard tests of reading, llitOTlng and language s^mboliaini: a^Mlityj" 
they concluded that the ^^cloze teits are related only moderately to the 
verbal comprehension factor'* (p. 259). 

^^Which is exactly what does happen* See the discussion of '-'specific 
comprehension*- on pages 23 and 2^* 
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Table 3.1 



Correlations Between Cloze Readability 
Tests and Standardized Tests of Raading Achie.vGment 



Study 



Sub ject s 



Tests 



Cor r c I a tio n s 



Jenkinson (1957) High School 



Rankin (1957) 



Ilafner (1963) 



Col lege 



Fletcher (1959) College 



College 



Cooperative Reading C2 

^Vocabulary •78 

Level of ComprehensiQn ^73 

Diagnostic Survey 

Story Comprehension .29 

Vocabulary -68 

Paragraph *60 

Cooperative Reading C2 

Vocabulary •63 

Level of Comprehension •55 

Speed of Comprehension ^57 

Dvorak- Van Wagenen 

Ra t e of Comp r e he h s t b n .59 

Michigan Vocabulary 

Profile mSh 



Ruddell (1963) Elementary 

Weaver Kingston College 
(1963) 

Gallant (1965) Elementary 

Greene (1965) College 



Helt^tnan & Bloomer 
(1967) 



Geycr h Carey (1972) Jr* High 



Stanford Achievement 

Paragraph Meaning #61- #74 

Davis Reading ,21- #51 



Metropolitan Reading #65- #81 

Diagnostic RCiading Survey .51 
Total Comprehension 

Iowa Reading #2 6-^5*68 



Differential Aptitude #33-#86 
Verbal Reasoning 

Standardized Reading #53 
Test 
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Bormuth (1969) questioiied Weaver and KiTigston' s interpretation of 
the data '^on at least four counts'^ (1) Their subjects were a highly select 
group of college students | C2> "the correlations upon which they based 
their calculations differed in size from those obtained by other investigators^"} 
(3) '*the standardised tests they used ^owed UTiuiual pattarns of factor 
loading *'| and (4) '^the cloze tests showed some Inconsistencias among them*- 
selves in their loading patterns" (p. 36l)» 

Bormuth then set out to investigate further the factor validity of clo^e 

testSi Nine passages of approxiiriately 250 words each wera alom&d^ aM seven 

mil tiple-choice test^ wmtten on each of the passages. 

The [muLtiple-choice]'. tests were written to measure 
cOTnprehenslon of vocabularyj of explicitly stated 
facts, of sequences of events? of stated causal rela- 
tlonshipSj of the main ideas of the passages^ of 
inferences, and of the author^ s purpose, #, #An equal 
number of each type of item was written for each 

passage, •The -items were then^^^^ -------- 

samples of subjects enrolled in grades fourt fivej 
and six* (p- 361) 

Bormuth found that ''the IntercorrelatiQns were high and fairly uniform 
across the different types of tests'* (p* 363)^ and concluded that "clearly 
one factor accounted for the preponderance of the variance-. FurChar, there 
was little difficulty in applying the name of ^reading comprehension ability' 
to that factor" (p* 364)e 

Though Bormuth* s study is an important contribution to cloze research, 
labeling the factor upon which both types of tests loaded reading comprehen- 
sion "ability" is an unfortunate misnomer that obscures important distinctions. 
Standardized comprehension tests are refined^ highly developed tests of 
gipneral verbal ability* Except for the opinions of three "reading specialists," 
Bormuth made no effort to validate his mul tlple-'Cholce comprehension tests* 
Bormuth* s criterion test^ In contradistinction with standard comprahension testSj 
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is weighted toward 'literal cornpreheasion*' (e.g*! 108 vocabulary items, 
63 Items deaUng with explicitly stated facts, and only 36 "inferential'' 
itms), Furthermora, lormuth compared cloze and compreherision test scores 
on the same passages, rasultiiig in a measure of what Rankin (1963) calls 
^'specif ic coniprehensiOTi*' or comprehension per se as dlstiiigulshed £rom 
Weaver and Kingston's (1963) attempt to measure '^general comprehension" 
or "general verbal ability. " 

Several other investigators have constructed multiple- choice and sen-' 
tence* completion comprehension tests In order to compare close scores with 
comprehension scores on the satne passages and thereby to evaluate the close 
procedure as a measure of ^''Spe elf ±c comprehension,'^ Cortetations between 
cloze test results and coniprehension scores on the same passage# are 
generally high as would be expected in Light of the preceding discussion* 
Taylor (1957) got a correlation ^ q£ « 80| Jerfcinson (1957> •82i Frl^toan (1964) 
,90 to ,91 1 and Bormuth (1962) *73 to .84 (or .93 when clo^e tests and 
comprehension test results were cotiibined first and then correlated)* 

In sutimiarys the cloze procedure appears to be a 
highly valid measure of the specific comprehension 
of a particular message* In factj It Is a more accurate 
measure of spacific comprehension than of general raad^ 
ing skill as measured by standardized reading tests. 
(Bankln, 1965s p. 136) 

It should again be notedi however, that the criterion tests used in these 

kinds of studies of the cloae procedure are seldom validated (Potter , 1968), 

Informati on gai n* The close procedure has also been used to measure 

^Mnfoiination gain^^ (somettoes referred to as ^^knDwledge j 'V^eadins?" or 

^*ii?arning gain-')i Infomation gain Is assessed by testing cfTuiprehsnslon 

before fcnVl after reading the passage upon which the tmt is hasedj and then 

taking the difference between the two scores as a lueaauj-e of inforniatlon 
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gain. In an attempt to test "learnir^*' (iTiformatlon) gatn with tha clo^e 

procedure J TayLor (1957) constructed five different test forms based upon 

a long (3, 2A0 w^ords) technical article. As criterion measuTes 5 Taylor used 

''two iiiatch.ed compreheTision tests,'- one designed to measure pre-test know-^ 

ledge of the imterial In the article and the other to assess knowledge of 

±t iimnadlately after study* He then cons true ted three fornis of a pre-reading 

close test and three iomu o£ a pDst-readiiig cLo^e test on a 20% 

if 

sample of the same article • The typa of deletion varied froin any-wordt 
to *'hard'' wrds (nouns, verbs ^ aiid adjectlws), to ''easy'' words (verb 
auxiliaries^ CDnJunctlpnSj pronouns^ and artielasD on the three foms* . 
Stiiderits ^ere allotted to study the article Ijiffnedlately before attaniptliig to 
restore missing words on the post- reading doge* Taylor GOncluded that 
'*'atiy* and ^hard' [close tests] yielded equally sigiilf leant learning galriB, 
ones aotnew^hat larger than the correspoadlng comprehensiOTi tests did'' (p# %^)^ 
In order to select a representative saniple of the 3 5 240 '^ord article for the 
close testSs howeverj raylor mechanically selected eight nim-llne sub* 
samples, for a total of 650 wordsj and artificially Joined thern together- 
The results are therefDre suspect as a ineasure of the coinprehension of 
coTinected discourse* 

BankiTi (l^S?, 1959) also attempted to nLeseure knowledge gain with the 
cloise procedure and found the most signiflcaiit gain scores ^Ith a modified 

16 

"The kind of "Information" gained obviously depends on the kiTids of 
cornprehensdon questions , so there Is no more spacif deity Inherent in the 
use of "Inf orrnatlon gain'- than there Is in '^comprehension." Subtracting 
pre- from post* reading test scores does, hot^evar* allow the investigator 
to reduce the laeasure of pre- test knowledge In test scores. If comprehenslDn 
Is the asilrnilatlon of information in the te^ct to cognitive structtires in the 
readerj such a reduction is absurd* 
17 

"Usually referred to as "pre-cloge'^ and "post-clo^e" tests in the 
literature* Ihe standard cloze procedure results in a pre-clo^e test. 
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cloze proceduri* "Tha correiaCion DecweeTi nne pre*-Gio^e, iiyuu-v«i.y u^m^^^^^u 
test and tK# criticloii Ceit was .86 (corrected for atcematlonD'' CRat^ln, 
1974, p. 137). 

Go Uman and Millar (1968)j using the standard, any-word deletloni found 
that '*the close score befora reading, *« Is raeagurlng esseTitlaLLy the amm 
information as the clci^e score after reading, the correlation b.mwaen the 
tw was ,93»> Cp, 374)* They concluded that '^the bilatsTal constralTit ±i so 
great that surpristnjly little Inforniatlon is added to It by reading the 
passage'* (p* 37^)* Greene (1964) also fouiid little difference batween pre- 
and go it*-readlns close scores when deleting ELtiy word^ More research is 
needed/ but modiEicatloM of the cloze proceduri mm to be more viable as 
tests o£ informstian gain (Rankin, 1974). 

Interpye tlTig ^loze. Scores, Several i nv^iscigators have attemgtMd to 

develop statidardi for ititerpreting close scores? Bomuth <1967b) deterrTiined 

18 

that a Score of 38^5 corxect restorations oii a converitloniil cloae test 
is equivalent to 73% ^ specially constrUGtedj multiple- choice 
comprehension teat on the same passage (a test of ''specific comprehension" 
as defined above)* £f the multiple-* choice score is corrected for guessing^ 
the eq^ulvalent eloae score is Since the cloaa test was a measure 

of "specific" rather than ''general coTtiprehenslon*' and since the muLtlpIe- 
choice test was not validated against any established comprehension test^ 
the results of chts study cannot be generalized. Ixi 196B^ ho^.sver^ Bormutli 
used the CalifoCTi^ Achievement Test as a criterion meMure and found that 



A "con^eriticnal^^ or "standard" close test is defined as a clo^e test 
^here passages ace 250 wrds or more in length, every fifth -wQrd Is deleted^ 
only exact^^ord reTl^ceinents are scored as correct (mirior misspellings excep 
ted) I and the test is given under untltned conditions* Most clo^e research 
has conformed to these strictures (Potter^ 1968| Ratikiiij 197 4), hence the 
label, '^conventioMl^- or ''standard," 
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cloze scoTef of 44% and 57% ^ere -coiaparable to reading adlieveniiTit 
test scores of 7 5% aiid 95% respectlv^^ly* Rar^ln^ and Culhani (19S9) 
came within an average 3.1 percentage points of replicatlTig Bosnuuth^s 
1967 results^ with greater dlffereticas towaxd the exttemesi "particularly 
toward highet' mtiltiple- choice perceTitaga scorts" (p. 197), which they 
attributed to cellitig efftcts on Bonauth's multiple* cholcg teitt Table 
3*2 indicates the CDmparable scores in all three studies* 

table 3 *2 

Glo^e Test Percentage Scores Comparable to 75% and 90% 
Criterion Multipl fi^ Choice Scores 



Cotnparable Qloge Percetitages 

Gyiterla Sorinuth (1967) Bormuth (1968) Ratikin & Cul hatie 

151 38 44 41 

90% - 50 ^ 57 61 

In light of the fact that two of the thcee mill tlple'- choice tests ^are unval^^ 
datedj that the incDTislstericies in the canceptlon and natxire of ^omprehenst,^^ 
between the chree multiple^ choice tests affect comparable clogfi Scores 
rather stroiigly as Indicated in Table 3^ 2^ Rar^lTi and Gulhaae' s (1969) conelugjon 
that *-it is no^ possible for teachers to Interpret cloze scocgS ^ith Bom& 
degree of confidence by using specific percentage points as criteria of 
acceptable perfDrniance'^ (pp, 197-198D i&ans omrstated. 

In the tnost thorough study □£ cloze criterion scores to dst€j Bornuuth 
(1971) exploded the reLationihips bettsreen coni/enttonal close tests and va^ioyp 
criterion measures, including 'hndeasures of Information gainj r#t€ of readi^j^ 
willingness to study 3 atid preferences for the subject mattar, stjla^ and 
level of difElculty'^ Cp* viil>. Cloze scores and multipl^^ choice or santen^g^ 
completioTi scores were coinpared on identical passages- ''ComprehcTision-'* '^a^ 
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C0Ticeptuali2ed as '^informa tion galrii " the dtfference between pre- and post- 
reading scores on the multiple-* choice or seritence^cOTnpletion tests* Ihs 
TOtton of coTnprehen.slon s further restricted to "the infomatioii explicitly 
signaled" in the passages (pt 21) or Co ''what is corononly called liCexal 
cotiipreheniion" Cg- 117)*^' Though Boririuth^s model Is admittedly tncoinplete 
and tentative (p» 20) | the results are generally consistent with previous 
studies * The relationship i between cloze scores and Inf ormation^galn scores 
vaTlad considerably from grade to grade as illustrated in Figure 3il- Bornuth 
Interpreted cXom scores between SSl and 49% as Indicative of the 
appropriateness the te^t for Instructloml usas^ and close scores between 
50% and 70% as Indicative of Independence ieval teKtual material. 

Bormuth's atudys as mentioned pravlousLyp is primarily concerned w-ith 
''standards of readabllltyj " but Haiiseri and Hesse (1974) used these criterion 
flccres to interpret GQmpxc-lu:::.risiQ:n tfest: scores in Madison public sehoolSj 
and the results were unexpectedi Large proportions of the studetits seemed 
to be reading below the ILtaracy level as defined by Bormuth^ s cxlterlpn 
Scores. It should be noted, however j that Hansen and Hesse (with BormLLth 
as consultant) used clui^e passages of less than standard length (50 to 70 
wrds) whereas Bormuth's crtterlcn scores were developed with passages of 
250 or inore words* Moreover, Bomuith's criterion measure in the 1971 study 
was again unvalidated- Mich more work needs to be done on clo^e criterion 
sccreSj and, until firmly established, cloze scores raust be Interpreted 
cautiously. Flnallyj research on clo'^a criterion scores has been limited 
to the standard close procedure and is Inapplicable to variations in format 
and type of deletlonsi 

19 On the basis of this assert >*n alones Bormuth^s contentloii that clo^e 
tests involve a broader range of skills than those '^normally identified and 
measured In multipl e- choice cotnprehenslon tests .^including] those that are 
so complex and difficult thac they fall above the upper limits of the niultlple^ 
choice tests" (p. 32) Is TrLisleadingt The notlom of '^information gain*^ in this 
sttJdy is hardly com^parable to the critical reading skills assessed by standard- 
Izedf multiple- choice cornprehension tests* 



COMPUTJOM TESTS 



40 • 




CLoze 



Figure 3.1. Hep-ession of Information pdn scores on clog© scores. 
(From Borinuth, J.R.^ DeveloptBejit of Stsndards of Beadabilltyj Toward A RationaJ. 
Criterion of Passage Perfortiaace, Final Report, Clitcagoi Ilcivarsity of 
Caiicago, June 197% p. 102. CbRIC DOCTMOT W 0$k 233 ] •) 
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pBletiloti i ypes* Though most researchers have used tha standard, any«i 

word deletloa procedure when InvestigatiTig the closet there Is some evidence 

that the deletloii of particular parts o£ speech nouns) tmy be used 

to separate comprehension per se from gemeraL verbai abilities* In the 

dlseussion of ^'Inforniation gain^" foe Instanca^ it was noted tlftat the 

deletion ot nouns, verbss and modifiers (^Uescical** or "content'* words as 

20 

they are conmonLy called in the literature) seems to produce better reading 
gain scQTes than conventional, any'-word deletions, Furtherj Rankin (1974) 
maintains that "the almost exclusive reliance upon * « • [the standard] oloise 
has atreiigtlieiied the influence of general verbal abilities and intelligence 
upon the cici z e mea sur ement o f r ea ding co^mprehens ion" Cp * 3 ) • (^ny inve s t i- 
gators [Taylor^ 1953| ^nkinj 1959 ; Fletcher/' 1959 1 Deutscb . at aU i 1974| Ruddell, 
1965| Schnayar^ 1965] have found substantial correlations between cloze 
scorei and rneasurei of intelligence, especially general verbal ability.) 
Taylor (1953) and Rankin (1959) compared any-»word deletions with "lexical" 
deletions and found that correlatioTis tetween cloze scores and IQ were 
diininished hy deleting only ^-lexical"' wrda* 

In a more elaborate effort to explore the relationships between types 
of deletions and comprehension scoreSj Louthan (1965) constructed seven 
different types of cloze tests froin each of 24 prose passages, 50O to 600 
words long, using a 10%^ deletion ratlo^" and administered the teptg ^ 
to 236 S€ven-h^ grade pupils* The seven kinds of deletions were any«w6rd^ 

20 ^ 

Any^wrd deletions j on the other handj are coxroonly called "strvictux'al" 

deletions on the assumption -'that the total ainount of structural meaning in 

a passage would be reduced more than the total amount of lexical meanings 

if cloze tests ^ere constructed by deleting every 'nth word*-' (Rankin^ 1974^ 

pi 4)* '^Structural deletion-* was an unfortunate phrase, ho^wevar? and only 

created confusion (Rankinj 1974), 

^^"20% is "standard*'. 
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nouns, verbs^ raodlftirsj prepositions and conjunctioTis, detemlnerSi and 

pronouns* A control group got passages with no deletions. The pupils 

attempted to replace the missing words on the cloze passages and then 

answered 12 '^comprehension'questions (sIk *'£actual^' and six "Inferential '0 

22 

on ths same passages without referring back to the passages- Louthan 
found that the deLecion of ''lexical'* words significantly affected student 
ability to answer comprehension questions and concluded 'that nouns, verba s 
and TOOdifiers ''are dha basic meaning carriers o£ the writtexi language" 
Cp. 297)* 

Several other Invastigators have analysed deletion types in the cloze 
procedure. Ohmiacht at al* Cl970)j for eKamplef studied -'the relationships 
of flexibility of closure and speed of closure to a number of cloze tasks 
representing structural ^any-*word], lexical^ concretei and abstract deletions-' 
ipm 206) • Like LDuthan^ Ohranacht et als concluded that '^a lexical deletion 
Is considered to iamplB a dlffereTit construct, ' comprehension^ ' because 
nouns, verbs, and adjectives seem to have a good deal to do iirith such ' com- 
prehension' components as vocabulary'' Cp# 215), Bickley, Weaver, and Ford 

(1968) also investigated the effect of deletion by grTOmatical categories* 

23 

When nouns, main verbSj adjectives, and function words were deleted separ- 
ately, "only the delation of nouns had a significant effect on S's ability 

to suppiy^ mui tlple-»cholca answers" (p. 614)* Moreoverj referring to a 
24 

previous study, Bickley et al. note that '*the most delaterlous condition 
%#ss the blacking out of nouns, main verbs, and adjectives [sijnultaneously] 
leaving mostly the function word categorles| Ss who had no reading paragraphs 



22 

It should be noted that the directions for this test adiriinlstration 
strengthen the menory factor in comprehension (see Carroll , 1972), necessarily 
making It more difficult to answer questions w'hen '^content'' words are missing* 
23 

"All words blackeJ out except nouns, main verbs, and adjectives'- Cp« 613)* 
^ ¥eaver and Bickley^ 1967 # 
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at all did better than these Ss^' (p. 614). BradXey (1970) extended the 
findings by Bickley et al. to "lower grade and age levels'' (p* 9Z)# 

In general^ then, It seenis that Kestrlctltig deletloxis to so-called 
lexical wrds in the qIozb procedure 3Ceduces the syTitactlG while heightening 
the seinaiitic component in clo2e scores* (It is neither possibla tior desirable 
to alimiiiate the syntactic component since it Is part of the process of 
comprehension. Lexical words obviously carry structural as well as senantlc 
inforiTiatlon* ) In light of the small number of studies devoted to deletion 
types J howeverf, mch an Interpretation of the findings is tentative, eapecially 
coneidarlng the vague and various notions of comprehenslori evidenced in the 
multiple- choice^ criterion measures. Indeed^ as Ohnmcht et al« (1970) 
ramark^ 

The fa« that responses to clo^e tasks reflecting 
easentially gross deletion strategies align them- 
selvei with crude measures of comprehensiori does 
little to shed light upon the fundmental nature 
of comprehension other than to indicate that on# 
can measure what passes for wmprahsnsion in more 
than ©ne way. **• Researchers using the cloae procedure 
ought to give . careful consideration to language opera^ 
tions and to rational operations which are Implicit 
in verbal activity and they should construct deletion 
patterns which seem to relate to these operations. 
Rather than standardizing a particular clo^e deletion 
type^ ©Kploration of a wider range of deletion types 
which are related to particular linguistic and 
psychological hypotheses is m^eded* (pp. 215-216) 

Sungnary and Conclusion 

The cloze parQcedure was developed as a "new^ tool for tneasurlng read* 

ability*' in 1953 and more than 20 years of research since then has firmly 

established th© ^lozm as a measure of readability^ comprahension, and related 

areas of inquiry^ As a tneasure of readability ^ the cloze has proven far 

nore accurate than readability formulas^ but the comparison is mialeadlng* 

Any tool that measures is likely to be more accurate than one that predicts* 

17 




"BiM accuracy of TtieastA^emant^ however, in no wise intisfies the need fioK 
ipr^atctabllity* As a Mfcl^tr o£ fact, Tnost paisagas on cloze comprah#msiotj 
%mnt% are giraded for m&^AmhiXltj with the Dale-.Chall fomula (Pottati l968>* 
Ai a test of readability^ the cloze procedure sacrtfices the convenlmnw oi 
predictability for the Bmum^y of measurement* 

But It Is polntlea& to compare the cloze procedure with readability 
fOOTulas. The doe© la uwrthodoK comprehenSiOtl test which is so mmj 
tQ wnstriict that It can be wed to determine tha comprehenslbllitjr itm^^ 
ibiltty) of any teKt tm a particular group of students in short o^dw* 
R^thex thaTi comparing thB- Qlo^m procedure with ua^idabillty formulas^ it 
wuid be more meaningful to wmpare the close with the traditional 
h^n^lon tests used as orltevloti measures in developing readability BootiI^s* 
Readability fomulasr £w emmplej yield consistently higher predlotlw 
validity coefficientf. wh^m the cloze Is used as a criterion measurer and 
Khi^ implies that thm elosae procedure is niDre aewrate as a test of ^a^ditiig 
QOmptr^hension than tradltioml comprehension tests* The emphasis in clam 
mmmxch has graduaUy shifted from readability to comprehension* 

4s a Pleasure o£ sheading comprehension^ the ao^e has several advan^ 
t^ges over traditional^ yrwl tip! e- choice comprehen:9lon tests* As noted 
prwiously^ cloze tests are wsy to construct, iitquiring no particular 
mpmtiBB in language m tei^ing* Moreover, Qlom tests sample th^a ^yxkt$.c^^ 
tic m^l seruantic content of m passage more objeotlvely and thoroughly timM 
Btiy other comprehension test* Most importantly^ the cloze proGedure doe# 
not disrupt the proatsi oE comprehension with eKWaneous dlfflcultl^i mA 
p^eocesses in the fom of queatlofis which ^Xt^ OOTertines more difficult to 
q^impjpehend than the paiSiige itself. 
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On the contrary, guessing m£»aing words in connected discourse seems to 

be very iimilar to the way sklUed; readers actually itesd* headings that iSs 

appears to be a "psycholingulifclo guessing game^" ReadWS mkm 4ectslons 

about the interpretation of tto ty^terrelationships anpng th^ words of the 

discourse based upon the infoiW^tion they cull from th# iyntactic and sot« 

.25 

antic cues in the text and th^ir pwvlous verbal and wom^^'ViS^rbal eKperience* 

These Interpretations in turn mmtm e3q>ectati©ns in tha utnd of the reader 

26 

for congruous information. fill: iVvf^ider then^ ''reads ahiad?*- ■'predicts^' words 
and groups of words he has no^ read on the basis oi Mm eKpectations about 
the text* The close proceduria)? rather than disrupt 

extransQUS skill Sj only slowi It town^ forces the readeJf*^ attention to the 
linguistic inf Donation in the tm% thnt partially govs^Ti^ his interpretive 
decisions j the resulting expeQt^^tons of congruous infoOT^fcion, and the 
predictions about the parts oS tey.t not yet seen^ th^ oloase procedure, 
'that is, forces the reader to fewa on the syntactic m4 ^^nantlc conteKt 
sui^rqundlng the missing wordi*, tku& the attention it- whavm it belongs 
in the comprehension of printed dlacourse--on the ^'Intwantniatlon of words'* 
in the text and on the lntegratl"vrt faculties of the reader^ 



The reader^ s guesses also predicted on the bMi# of his own 
syntactic and semantic «»systami^^ There is evidence that Ki#aders store the 
text as paraphrase^ i*e*, thmt th^y process text for mmn%n$ in terms of ^ 
their own syntactic und semantic system (Lennebsrgj 19W) « %f there, is 
little or no congruence bewaw the syntactic :.Ad s©Mncla systems of the 
text and those of the reader^ the text is of course incomprehensible to 
the reader* 

^^Cood wrlterss of cour«5> ^^play'^ with these expertattonSj delaying^ 
momentarily thwarting them, iuipprt^ing the reader with a mtm inclusive 
resolution than he had expectid.. fhis kind of "play*'' go^i on within sen* 
tences as well as in larger, fc;hw«l.ttc units* 
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In the process of emphasigitiii conteKt, the cloze procedure my tend 
to focus attention on sn^ller syntactic and iemanttc units than wautd ordi- 
narily be the case In perusing a text for meaning, A few studiiii based 
upon a limited number of passages 5, indicate that moit of the syntactic and 

semantic information naedad to supply missing words In connected diacdurse 

27 

comes from the six to ten words iwroundlng each delation, " The clog© 

procedure, that is, seems to £owm ^nsclous interpretation of umifca of 

discourse that a skillful reader wuld ordinarily subsume (or »^rec«di" or 

"chunk'* as the process is sometimes labeled) in larger units of mMning-' 

That . ii.^. "in cpmprehandlng. senteys^#s in dii course p gs^^ com 

or ^schema^i they reduce the information into larger saMntic unlts^^ 

CDooling, 1972, p* 60) # Forcing reader to consciously interpret STnallar 

semantic units, however^ does not in itielf indicate an Insensltivlty to 

28 

larger semantic units (both ^'explittit^* and '^Implicit") which bind the 
sentences of a discourse into a l«ger unity of meanings As a mtt^x of 
fact, disrupting the normal orde't: of the discourse causes cloga sc«)res to 
fall off, and this surely Indicates a sensitivity to context beyond the six 
to ten words surrounding the mtsiitig word* The extent of $e.niiAtivity, 
however, is still an open questiw* 

Additional evidence for the validity of the clo^a as a compMh^nsion 
test com.es from correlational stti^tei- Correlations between clo^# scores 



Whether or not this is a g^tieral feature of written Ingltsb retnalns 
to be detamlned. The interplay between Ittmediate and remote content can 
vary greatly from text to text, wdl the effect of this variation On con- 
textual constraint has not been ^teudled* 

^^e*g,, themes or tone, A p^^ticular rastoratlon might be appropriate 
to its iTranedlate syntactic and s^nmntic context, but violate the torn of the 
message as a whole* (When studenk.S3 queried about their guesses^? respond 
that it "sounded right," their explanation may not be so superfl«&lal as It 
sometimes scctis.) 
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and scores on standardlgedp nom-rafsrenced tests of readltig coinprehension 

are generaily substantial, and this Is Indicative of the validity of the clo 
a maasure of general coinprehension or the ability to comprehend the kinds 

of reading inaterials sampled by the test* 

That such correlations are substantial rather than high is consistent 

with the foregoing analysis^ Standardized, norm-referenced tests of reading 
comprehension are biased toward a conceptualization to reading as reasoning 
and emphasize general verbal ability at the expense of more specific and 
fundamental comprehension skills* Cloze scores^ on the other hand, have a 
strong syntactic factor*. Moreover^ as natei above> . ther^ .. : 

pradomlnance of intrasentential constraint governing the restoration of 

missing words In clo^e tescs. These two observations suggest that the cloze 
procedure measures a lower level of cQmprehension than standardized' horai- 
referenced tests* The tendency of cloze scores to correlate higher with 
the vocabulary than with the comprehension sections of standardized tests 
provides more evidence of a similar sort. The standard cloze procedure 
thus seemis ^ i:, sure a level of comprehension soTn^where between the polar . 
extremes ro^-resc^ated by vocabulary and comprehension scores on standardized 
tests* 

Findings from studies of the cloze as a test of specific comprehension 
or comprehension p^ se (as distinguished from general verbal ability) also 
imply that close restorations make fewer demands on the reasoning powers of 
the reader than items on standardized comprehension tests* Correlations 
between cloze scores and multiple-choice comprehension scores on the same 
passages are consistently high* Moreover, an examination of test items on 
these specially constructad, multiple- choice comprehension tests Indicates a 
bias toward "literal comprehension" (because such items are easier to write 
and replicate?). Thus the cloze is usually considered more valid 
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as a test of specific than of general comprehension and seems to geC at more 
fundaraental comprehension skills than standardlzedi noim'^raferanced tests. 

Finally, salective daletions (e^g.^ verbs only) in the cloze prQcedure 
raise the possibility,. of identifying and manipulatiiig the role specific 
linguiitlc components and form classes play In the comprehension of connected 
discourse. Conceptualisations of comprehension could then be stated in such 
a fashion as to lead to testable hypotheses and empirical investigation- 
There Is already some evidence that comprehension ^r se can be extricated 
further from general verbal abilities and reasoriing processes by limiting 
deletions to lexical words (nounSp verbSj adjacttveSp and adverbs)* Delating 
only lexical words also seems to reduce tha syntactic while heightening the 
semantic factor in cloze scores. The maltipla- choice cloze tasting system 
described in the naxt section of this report is a furtiier eKtension of this 
line of inquiry* 
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SPP-^D CLOSE MEHCISES IN A MUMIPLE-CHOiCE FORMM 

Twenty years ilo^a research has neither produced an entirely satis- 
factory clo^e comprehension test nor has it silenced the critics of the cloze* 
While the standard close procedure has some decided advantages o%^er traditional 
comprehension tests, it also has soma serious liabilities. Most close research 
is based on the any-word, every-fifth-word deletion pattern, and the almost 
exclusive use of such a pattern seems to have resulted in a measure that loads 
too heavily on syntax and general verbal ability (Earicin, 197^)* In addition^ 
the free-response clo^e, where th'> student \vrites in the missing ward, is not 
amenable to inachine scoring and makes a horrendous task out of scoring tests, 
particularly with large numbers of students. The modified cloi;e procedure 
discussed in this chapter is an attempt to respond to these and other criti»- 
cisma of the clo^e procedure as a test of comprehension* 

The S^:andard Cloze Procadure 

As defined in the preceding chapter, the ^'standard" or ^^conventional** 
cloae procedure is a mechanical technique for deleting every fifth word in 
a teKt of at least 250 words and replacing the deleted words with underlined 
blanks of a stajidard size- Students who have not been allowed to read the 
original text are then asked to v/rite in the missing words with no other ^ 
clues tc their identity than the mutilated text* There are no time constraints 
on the task, and only exact replacements are counted as correct* 

"^System for Pupil and Program Evaluation and Jevelopment* 



Advantages 

Quastlon-frae item type- The standard cJo7.e procedure has everal 
obvious advantages over traditional reading comprehension hests of the pas- 
sage and question type. Most commentators point out the ease aiid objectivity 
of constructing a close comprehension test — there are no questions to write 
and no sets of distractors to produce; indeed, given a passage, there are 
no subjective decieions of any kind to make- Considering the theoretical 
and practical difficulties posed by the construe tion/Of traditional compre-^ 
hension -irt items, that is no meaji advaritage. It/ was argued in the first 
two chapt s of this proposal, for instance, that (1) it is incumbent upon 
the test-maker to specify and control the relationship between text and 
test items else there is little possibility of determining what the test 
measures, and (2) traditional u.ut items, including the wh-item, either con- 
trol the relationship between item type and text to the exclusion of the 
semantic component of the discourse, or introduce the test-writer's own 
idiosyncratic interpretation of tht? text into the test items, thereby sac-- 
rificing objectivity and passage dependency. In addition, the test items 
tiiemselves often introduce comprehension difficulties which are extraneous 
to the test passages. If the close procedure can avoid such problems and 
still produce a viable test of comprehension, then it is a boon to test= 
makers and teachers alike. 

Comparable to the reading process. Another major advantage of the 
cloze procedure, less often mentioned in the literature, is that a cloze 
comprehension test elicits decipions from the student , which are very similar 
to those decisions stude.-ibs ordinarily make in attempting to comprehend 
printed discourse. Reading is a constructive language process| that is, 
any reader has to reconstruct the meaning intended by the writer from minimal 
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information on the printed page. The interaction between the reader and the 
text might be described in brief as follows s The reader comee to any segment 
of connected discourse with expactr.^ions about what that discourse means 
based upon the verbal and non«verbal context in ' it occurs* Likewise, 
the reader comes to any sentence in the discour^; :h expectations about 
what it means based upon his apprehension of the meaning of previous sen- 
tences and the verbal and non-verbal context in which they appear. Given 
the expectation of meSTiiag of a particular sort^ the reader then searches 
for the logical subject aiid logical predicate of the sentence which will tul- 
fill that expectation* In effect, the reader makes an hypothe^ia about what 
a Bantence means, and then samples among the linguistic clues to meaning in 
the text in an effort to verify his hTOothesis. (An "objective'* i^^^ ler is 
a fictioni readers are always biased =-that is^ selective in their perceptionsp 
If the hypothesis is quickly verified^ the sampling procedure can be very 
attenuated* If, on the contrary, the hypothesis is not immediately verified 
(and readers can be r rkably blind to contradictory information), the 
reader may sample mc --^^^nsively in an attempt to verify his original hy- 
pothesis or may change his hypo -sis and Lsample again. And so on. 

Now the demands made by the cloze procedure on * student are not far 
removed from the psycholinguistic processes implied by this model of reading 
as a constructive language process* The cloze procedure does not unduly dis- 
rupt the reading process* That is to say, a competent reader is always "read- 
ing ahead," making predictions about what a given sentence should mean, and 
then sampling the linguistic clues to meaning in the sentence in order to 
verify his predictions. The cloze procedure, in a comparable manner, asks 
the student to predict the meaning and identity of v/ords in a discourse based 
upon the student *s apprehension of the meaning of previous sepients of the 
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d:i ..course and the verbal and non-verbal context in whigh they occur (i*K.| 
thg test directions, the testing situation, etc.)* ^' possibility of recon- 
atructing the surface structure of the discourse exactly as the writer had 
intended it even though 2iK ot the words are missing is made possible 
by the natural redundancy of language ^ the well-formedness of the diBcnurna. 
and the shared psycho- and sociolinguistic systems of readar and writer. 
These are, of courge, requisite condibions for the comprehensioii of any dis- 
course, deletions or npi so the cloaa procedure requires no peculiarly 
redundant texts- 

Specificity, In addition to ease and objectivity of test conBtruction 
and a general similarity in the demands that both the reading process and the 
cloze procedure make upon readers, the cloze procedure a^^so makes it possihlft^ 
to identify and control the interaction between text and item type (deletion 
tjpe and rate). Analysis of deletion rateSi for instance, indicates that 
most of the syntactic and semantic information needed to replace a missing 
word is found in the six to t^n words surrounding the deletion^ More pre- 
cisely, the information needed to replace function word^ (e-S-i preposition r^^ 
determinerB) is generally found in closer proximity to the deletion than the 
information needed to replace ^^content'* words (e,g,, nouns, verbs) (Fillenbaum 
Jones^ and Rapoport, 1963), Thus the extent of the verbal context within 
which interpretive decisions are made can be specified in the cloze procedure, 

Moreovrr, the relative influence of syntactic and semantic clues in the 
text on cloze scores can be determined. In the standard cloze procedure , for 
instance, the any-wordj every-fifth-word deletion pattern produces a prepon- 
derance of the syntactic component of the text in cloze scores simply because 
most sentences have a greater proportion of S5rntactic than semantic clues* 
Observations of bhis kind have led Rankin (1959t 197^) to dub the standard 
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clo^e '*the structural cloze*** Cf-*rrelatinns with measures of the comprehen- 
sion of syntactic structures (Simons, 1970| Stedman III, 1971) also indicate 
a strong syntactic factor in cloze scores. Since an apprehension of the 
syntactic cture of a sontence Is fundamental to its comprehensiQn, . the 

standard cloze procedure measures a more basic, identifiable level of com^ 
prehension than standardised comprehension tests. 
Disadvantages 

Local redundancy > What is perceived as an advantage from one ^arspec- 
tive, however^ can just as readily be characterized as a disadvan.a,^,:- from 
anothor. Carroll (1972), for example, has criticized the standar cIo^b for 
its dependence on syntactic cues and insensitivity to the train of ideas that 
runs through a discourse and binds it together. Brown (1970) also identifies 
the standard cloze as a more rudimentary measure than comprehensions-assimi- 
lation to co(piitiva categories. Though it is impossible to separate syntax 
from compr'^hension, in general, it does seem to be true that the standard, 
aii^ -v/ord, every^fifth^word deletion pattern produces a measure of comprehen- 
.iion unduly weighted toward syntax and thus unduly dependent on local redun- 
dancy. 

Exact -word-only. In comparing the cloze procedure to a model of reading 
as a constriictive language process, it was noted above that the standard cloze 
procedure required a student to predict not only the meanings of words but 
their specific identities p The difference between predicting meaning and 
exact-word-only replacements of mi.-aing words marks a clear line of demar= 
cation between the staiidard cloze and the model of reading as a constructive 
language process. While the reader ordinarily tries to reconstruct the mean- 
ing of a written message as represented by the orthographic system on the 
printed pagei he may do so in termi: of his own syntactic and semantic 
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structures. That is, when asked to recall the meaning of an uttarance, a stu- 
dent will often reply in terms of his own competence i he reconstructs the 
meaning in his own langUBge patterns. SlobLn and Welsh (1967) i for exfiniple, 
cite the follov/ing eKChange between a model and two-and-onf -half-year-old 
child I 

Model I This one is the giant, but this one is little, 

Childi dis one little, annat one big, (p. 8) 
(Fillenbaujn Cl970] cautions, however, that memory and comprehenBion are 
easily confused in such analyses*) Moreover, the sajiipling procedure of the 
model of reading as a constructive language process also ifflplies a rougii 
match (rather than exact replication) between the surface structure represent- 
ed in the orbhography on the printed page and the reconstructed massage in 
the mind of the reader, rinally, it was posited in the second chapter of 
this proposal that the surface structure of languap is never more than an 
approximation of the meaning intended by the writer or the meaning apprehended 

by the reader* / 

The staiidard cloae procedure, on the other haiu- . dMiTjds that a reader 
not only reconstruct the meaning of a message from the clues in the text, 
but that he reconstruct exactly the saine orthographical representation of the 
meaning intended by the writer. Now that is clearly demMding something in 
addition to ^^comprehension Rankin (197^), as a matter of fact, has cau- 
ticned teachers against trying to justify exact replacementB when using the 

close procedure as a teaching device, Goodmar (cited in Fiske, 1975) has bI&^ 
warned teaehers against ^'correcting'^ student approximations of a text while 
reading. Standard C' 02e scores therefore seem to indicate aomething more 
than a student's apprehension of meaning in connected discourse, 

Taylor (1953), who brought thp close procedure to the attention of the 
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reading fieldi was quite clear about the relationship between cloze scores 
and the apprehension of meai}ingi The percentage of correct responses on a 
standard cloze test inuicates '^bhe extent of likeness between the language 
patterns used by the writer to express what he meant and those possibly 
different patterns which represent readers* guesses at what they think the 
writer meant^^ (p. ^H?). The match between the language p atterns of the 
writer and reader is more demanding than comprehension normally meJces upon 
a reader. As a consequence, cloze scores are usually quite low in compari- 
son to scores on traditional comprehension tests on the same passages* 

Passage lengths Moreover^ the great range in the difficulty of replac- 
ing individual words on a cloaed passage makes it necessary to use passages 
of 250 or 'more words so that the test score reflects a measure of the average 
difficulty of the p.-ssage. Bub passages of that length make domain-referenced 
testing difficult ^^^faw representative passages could be used in the time 
available in any testing period. Furthermore, a great (haphazard) range in 
difficulty indicates that the test -maker is iacapable of specifying exactly 
what the test measures since the interaction between text and deletion is not 
sufficiently specified . ntrolled* While the standard cloze procedure 

has obvious advantages ovar traditional comprehension tests ^ it still amoujits 
to another global measure of "comprehension," whatever that is. 

Hand scorings In the standard cloze procedure, students have to write 
in the missing words, and test administrators must score each test laboriously 
by hand* Nothing so reduces the utility of the cloze as the necessicy of 
hajnd scoring* Until a viable cloiae procedure is developed in the m.ultiple- 
choice format, the cloze procedure will bn relegated to use in small class- 
rooms only* 
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The Modified ClQae Format of the BBFED OIqzb ExGrciaes 



The disGUssdon of standardized, norm-referenced reading comprehension 
tests, wh-items, and the standard any -word deletion type of the conventional 
clog© procedure should make it evident that none of these item types produces 
a satisfactory test of literal comprehension « Among the three item types 
discussed, however, the cloze procedure clearly offers the best possibility 
for objectively and thoroughly sampling the student's apprehension of "the 
grammatical and semantic relations which obtain within and among the sentences 
of the discu^-rseJ* 

Moreover, the cloz;© procedure offers the test-maker the opportunity to 
id-rhify and control the interBction between characteristics the student^ 
the text, the item type^ and the testing situation^ Any test of reading corn^ 
prehension should identify and control the interaction of such ch/;iracteristics 
in. order to sps ify what the test actually measures, but the need for an 
explicit construct becomes particularly acute when a * t is made to label 

a test or subsections of it according to the '^levcd comprehension (e.g., 

literal, inferential) it attempts to assess • The stnue of psycholingustic 
knowledge 5 however, allows for nothing more than a firsts tentative effort 
to identify and control such interacting characteristics. What follows, then, 
is (1) a brief, condensed, and tentative statement of a construct of literal 
comprehension based upon the evidence and analyses adduced in preceding chap- 
ters of this proposal, (2) a rationale for the modified cloze format adopted 
in the SPPED cloze exercises based upon that construct, and (3) a description 
of the actual uonstruction. of those exerGises* 
A Tentative Construct of Literal Comprehension 

Since reading comprehension was defined in Chapter II as the apprehen- 
sion of the meanins(s) of written discourse, it is evident that comprehension 
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is a suGcessful synthaaii: of the competence of the student with the demands 
made upon that competence t^t nhBr=-.teri stiffs of tho written discourse in 
question* In the testing situation, rn^ p^^rticu^ar den-ands made upon the 
student's campetence nr^ p..x:nj-:il^ co/itrol^^d by ch^ test items. Contingent 
circumstances external to the student wu-ch affect; a covrwt cs-Tithesis of 
features of the text accessed by the si€:^t itms and the Gompetence of the 
student are considered part of the test situation in this analysis. In 
addition, the dispositional limitation" of the student can affect either the 
attainment of the requisite competence or the use of such competence in the 
testing situation. 

Oharacteristics of the student. It is assumed by the construct that 
students who can comprehend at the literal level have gone through the nor- 
mal stages of co^itive and linsuistic development appropriate to their 
chronological age group. More specifically, it is assumed that these students 
^have no physical oi - .ychological impairments that hinder normal language 
development, reading ability ^ or test performance. IQ's of these students are 
assumed to be 85 or hi^er. If any of these assumptions is violated, then 
students may not perform as predicted belov/. 

It is hypothesised that comprehension at the literal level demands the 
same competencies and no other competencies on the part of the ^jtudent at 
any grade level- Two of these competencies arei a general knowledge of stand- 
ard^Engli/sh^speaking societies ^ and a basic competence with the English 

distinction is made here between language- and culture«specific 
competencies. Speakers of standard English as a second language, for example, 
may have the linguistic competence to comprehend many te^its at the literal 
level, but culture-specific knowledge, with which writers usually assume readers 
are acquainted and which they therefore fail to state explicitly, may, on occasion, 
thoroughly confound the non-native speaker's efforts to comprehend literally 
a given message. 
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language .'^^ Basic linguiBtic competence subsumes (a) lexical knowledgei (b) a 
semantic rule system for selecting appropriate senees from the meanings of 
lexical items, and (c) a sj^tsctic rule system for interrelating selected 
lexical senses. A third competency the ability to reco^ize and dif= 
ferentiate between different orthographic representationii of different lexical 
items -ind thaii sequential appearences in unique combinations in print as 
sentences of the English language. 

Charactei-iBtics of the text. The literal comprehension of connected 
discourse assumes that (l) the text in question is in fact connected dis- 
course. Thfc reading materials must be grammatically and semantically well- 
formed sentences in standard English that pursue a particular topic, situa- 
tion, description, idea, etc*, coherently | that is, there is no willy-nilly 
introduction of new topics, ideas, etc.^ from sentence to sentence in the 
text. In addition, it is assumed that the text has in fact a literal level 
to be comprehendedi that is, that tlie discourse is not so excessively idio- 
matic, metaphorical, or esoteric as to confound deliborately any effort to 
comprehend it at ty\^ literal level. If either of the above assumptions is 
violated, then students may not perform as predicted below. 

If the student has the competencids outlined above and the text does 
not. make excessive damand i upori those compitenciaa, then the student, properly 
motivated and given the opportunity and cc^nditions to do so, will comprehend 
the text at the literal level. Tha^. is, the student will apprehend *'the 
grammatical and semantic relations which obtain within and among the sentences'* 
of the text. If the student fails to comprehend literally, then the text 

^ The construct is stated in terms of the English language '^ut could 
eaniiy be restated in terms of any other language or :n general terms. 



EKLC 



92 



has exceeded his psycholinguistic competency, That is, the syntactic struc- 
tures in the text are too complex for the studeiit to "parse," and/or the 
vocabulary and concepts in the text surpass his lexical knowledge. 

Characteristics of the test items. The apprehension of the literal 
meaning of connected discourse is deilned as the apprehension of "the gramma- 
tical and semantic relations which obtain within and among thr sentences of 
the discourse," That is* the context for interpretation in Jimit^d^ insofar 
as possible^ to the sentences of the di^Ncourse itself, i^ucn a definition of 
literal comprehension is consid the lowest possible synthesis of the 
linguistic components of the t - . w;:/^:h the psycholinguistic competency of the 
student that can be labeled co^. ^ ,nsion of connected discourse without vio- 
lating conventional understanding of "comprehension" or "connected discourse." 
Test items which purport to measure literal comprehension, therefore, must 
access ''the grammatical and semantic relations which obtain within and among 
the sentences of the discourse" and only those relations, Passage dependency 
then, is essential to literal comprehension test items. That is, if the 
grammatical and semantic information necessary to the selection of the 
correct response from ^ong a set of responses is prewnt in or implied by 
the set of responses or other unspecified contextual featuresi then the cor- 
rect response cannot be cited as evidence of literal com.prehension* Moreover, 
test items must sample objectively and adequately among the grammatical and 
semantic relations of a discourse, or test scores may not be cited as evidence 
of the literal comprehension of that discourse. For example , if the item 
type subordinates the semantic component of the discourse to its syntactic 
component, ;hen correct responses to such an item type may not be presented 
as evidence of literal comprehension. 

If, howeverj the item type accesses only the grammatical and semantic 
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relations of the discourse^ and does so thoroughly and objectively, then Gor"- 
rect responses to such item types can be a ; ^ - ' evidence of literal com« 
prehension and no other "level*^ or "degree^^ r i nompr^ihension. That is, if 
literal compi^ehension is defined as the lov/ost possible synthesis of the 
linguistic components of the whole text and the psycholinguistic com^RtemB 
of the student, and if the test items only access such a synUl-^:t.-, 
these test items will measure literal comprehension or no comprehension and 
nothins pise, 

Characteristics of the testing situation, all those criteria that 
normally apply in testing situations in order to elicit the best possible 
performance from the etudent-^e *g. , appropriate time of day, minimum of 
distractions 5 etc.—are assumed by the construct, literal comprehension. Two 
additional assumptions a. stressed i (1) Students must be familiar with the 
item typ». Unccnventi nnal item types obviousJy require training, or students 
may not perform as predicted* (2) Students must have sufficient time to work 
carefully through ajl the test passages. Students who are rushed through 
passages on a test may not perform as predicted. Other than the motivational 
factor, the test situation f^hould be essentially neutral to the construct of 
literal comprehension ^ 

Rationale for the Modified Cl o ae Format Used in the SPPED Cloae Exerc ises 

k 

Delotion ty^ e. Only nouns, verbs, adjectives, and adverbs are deleted 
on the assumption that (1) such words carry moot of the information which is 
unique to any given discourse* Function words (determiners, prepositions, 
auxiliary verbs, etc.) cert-\inly convey information too, but such information 
is mostly structural; that is, function words primarily define the interre- 
lationships between the appropriate srnse^ of the lexical items of the dis« 

"Only nouns and verbs are, deleted in grade 1 and 2 materials. 
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course. Moreover, though aj^tactic analysis is vital to the comprehension 
of any verbal message^ the information communicated by syntactic analysis 
Glone is not unique to a particular message. Syntax rarely tells the reader 
anything he does .not already know (Kat^, 1972>, 

(2) Research cited in Chapter III also indicates that the deletion of 
nouns, verbsi and modifiers reduces correlations between close scores and 
measures of general verbal ability or IQ while simultaneously increasing 
correlations with measures of '^specific comprehension^^ and '^information gain*^^ 
Substantial or high correlations with IQ are antithetical to the construct, 
lir.eral comprehension* In addition, criterion measures used in studies of 
^^specifc comprehension^' and '^inf or/nation gain^ t^nd toward literal comprehen- 
sion in conceptualisation* 

(3) FiTlenbaum et al- (I963) foimd thnt the grajTimatical and semantic 
information necessary to replace nouns, v--^-^, and modifiers in connected 
discourse tended to be further removed frg-n the deletion than the information 
needed to replace function words. Topical content, for instance, which is 
more likely to affect the particular choice of nouns, verbs, or modifiers in 
a /riven ta:d, , is liable to be dispersed throughout the text* It is assumed, 
L-i.^r^fore, cnat the restriction of the deletion type to nouns, verbs, and 
modifiers will obviate some of the criticism of the cloEe as a measure of 
local redundancy and increase the semantic component of the text in cloze 
su.^res without eliminating the syntactic component^ Nouns^ verbs, and modi- 
fiers also convey structural information, albeit to a lesser degree than 
function words. 

(k) Finally, deleting only nouns, verbs, adjectives, and adverbs makes 
it feasible to construct a cloae test in the multiple^choice format. That is, 
rs'latively. few words can function as determiners or auxiliary verbs in an 
English sentencei making up distractors for ^'tne,'^ for instance^ saems pointless. 
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On tho othor hand, the number of v/ords that can function aa nouns in an Eng- 
lish sentence is enormous. In addition^ del«ltins nouns, verbs, and modifiers 
makui-; it possible select digtroctors that are specific to content areas 
(e.g,, social studies), a prerequisite to viable distractors in a domain-refer 
enced test* 

Deletion rate. An every=fifth-word deletion rats (20^ of the text) is 
considered optimum because it samples the grammatical and aemantic relations 
of the text objectively and as thoroughly as possible without depriving the 
student of the information he needs to replace the deleted words* Selective 
deletion types (e.g., nouns), howeve;% ''^ ^^rce some variation in deletion rfiite. 
For example, the test^maker v^--^ is t^^i-^^{ to maintain an every--fifth-word 
deletion rnte vihile deletini^ on.y nouns, verbs, and modifiers, often coimts 
five v;ords in h text and finds no candidate for deletion* If he backs up 
below three words between deletions, it becomes very difficult to replace 
missing words with so little remaining, immediate context. On the other hand, 
if he counts too far* in the other direction, then the thorougbness of the 
sample of the gramm.:.tical and semantic relations begin? .o suffer, A test- 
maker cannot readily reject passages on the grounds that they create difficul- 
ties for the adopted deletion pattern, else the passages will represent a 
biased selection from the domain reading materials* Consequeutly , the 
every^fifth-wnrd deletion rate is adhered to as much as possible* there are 
never fewer than three words between deletions^ occasionally there are as 
many as eleven words betv/een deletions- 

Diotractors> All distractors are (l) gra, plausible and (2) 

semantically implaufsible, (l) Grammatical plau/^i k ■ '" r.B that any 
distractor can ^gortoTm f-^operly the e^immmatical fuaction ascxgned to it by 
the syntactic position of the missing word for which it fimctions as dis- 
tractor* For example f if a noun is deleted, the distract ore are usually 
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''rmoiLris'* qe- words tiat csn ^iham L^ke m-wm in ttae ayatacii<3 pfisLiic^n in qUffi** 
tiom, ai^lr^ctors are graj^telicalLy p-liusi^le irsoffder to reiiuce furtbar ttxe 
syntactdc comporiin^ in ^es* scores stn^Qe Brsmrn-^tLQal rnGan^ingi as such| do^g 
not asD^u^t for very much of the tafor*ma-fci^» w^i^h ds wl.que to tie te^ct dn 

prorid© a ^0 defdnttive tag fo^ ctnoo^ing aoDn-g the s&ts of irejpongea, the 
Qlsh^a" handi, sjrniactic ai^aL^sL e a^s fundlarnentsl to the co^pirt^eit^iom of th& 
Int^jrelatloEiBiidps arr.ons tBe mr'h of aiiy s«rit fnss^ aad cuo tss^; of oomBreliere- 
Sioa should or cou.ld ehort -cl rcuit th^t analysis* Cpaet^UefitLsf, the sets 

agini th Btudeait to process th^ g^^ajnnmat leal guid aecnan.tic^ lcif&3ina-tion in 

tmKt 1 1 0 rrnoke aai initial h^^^h&^lm atoou-t till mesJilfig mA idsailt y of the 

missing: word before looWiifi for* vari£iastLcn of that hypottieBie amoag th^ ^ets 

of responses- In otJier v;©3'dSi mttmn-pt ireade tc pre^ir^e, ihe best feature 

of tlie st^iidard cl^ose proc*edixre^4-es sinidlaJlty to ih& r^aiiag pE"0Ga8s--v/ai3Lo- 

iiatrodiiOiyag m irniltiplt-chodee' formst* Iti smmrriary^ ayirsta^ is p^'teent tSiei 

madLfi^d c:loEe forma* adop-ted m ttie SPpS ^l^u ig^rcis^s, bu^ Lt is *ot 

p repottdarran^ icL cXo^a scores* 

(a) Irii ttie iefc q£ tespcn&eg ass^cciated mth ^^117 d© Utio*i cnty the 

Qcrrect. ajasvrer, time exa^ct ^o^rd in origLaaL tmnt 1 is genigfttigailat p3-ft^ al'ble - 
Again, th^ atteiupt is made tc? p^es^rv^ tSe fasiuK-es of tlx^ ^taiadaa'd ^clo^e 

procodisre wkile w^ediap oia1 its li^bi3-itieB « It wa^ Eioted in ih& tirm^ b bq^ 
tionoftliis eh^apter^ far dnstazace ^ tKat requiring students t© su-pply mm^t 
x^mpliomB at oii&sing wrds in th^ standard clo^e pro*cedluf^ ia ^nt itlaetical ^0 
tint modlel of reading as a ^orxst^Uc'^ive Lsns^age gro^ese, to tie otl:ier ha^idy 
ici fcliat- eeareh for t He Mgis of th^ c^fnracjiaiifcy at sneaniijig intead-ed the. 
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writer and appreheaded by the reader which is literal comprehension, it was 
posited that the common meeting ground betwecri writers and readars is the 
orthographic ripresentatlon of meaning th« printed page, and therefore 
that raprasentatiorL is the closest poBsible apprommation of what the vrlt^r 
meant* Accordingly j the modtfied close procedure maintaine the 
insist enca on. exact -word-only replaciements* The student who. makes a correct 
hypothesis about the fneailing of a missing word based upon his appreherision of 
''the gramniatical and semantic relations whicli obtain within and among the 
senteaaes of the dlsCDUrBe'' will have qo dirficulty rnodlfylng his hypothesie 
about th'3 surface representation of that rneaniiig vhen confront ed with the 
oorrect anewer amoilg a set of distractors. Ihe distractors only behave like 
traditional distractors when the vocabulary level of the ayntactic coraplexity 
of the passage eKcaeds the student's conipet&nce, that im^ when the sttident 
can no longer comprehend at the literal level. 

No attarnpt is made to tarnper with the approxirnation of the meaniag 
tntended by the author. No atbempt is made to interpret the te%t for the 
student, to imposi the test-writer's ovm idiosynGratic Interpretation of the 
te^t on the text in the forni of distractors thsLt compete vith the correct 
respenLse* Such sernantic competition » such aJternate possibilities for inter- 
pret at ior^, are antithetical to the construct, literal comprehension^ wliich Is 
rooted in comrnonallty of interpretation rattier than nuances of rneanlng* CraK- 
xkBy (1972 ) found j for instance, that semaiiticaUy plausible diatractors in a 
multiple-choice clo^e format introduce stpiflaant numbers of items into the 
clo^e test that are even more difficult thati the hardest items on standard 
close tests* Semaritically plausible diatractors extend the context for inter- 
pretation beyond the gramfflatical and sefflsntic relations of the diBCOUrse which 
ts^ again, antithetical to the construct ^ literal cQmprehension* It is 
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hypothesized that semntlcaUy plausible distracUoj?s will also tnalnfcPia or 
ttictease coxrelatiotis between the aloza procedmo end measures of geMraL 
verbal ability or iq. Such correLations are also aattthettcal to the con- 
struct, literal comprehensioii. Flpally, a etcymtically plausible dlsteactors, 

with an emphasis on nuances o£ tneanlng, rmkm the cloze procedure Into a vew 
difficult vocabulary cast. Literal comprehension does not demand so eKteii- 
sive or refined a vo-SftbulBTyi 

The Q oTist]nic-tli?ii o£ the SPglD Cloge Exeacisee 
Tlie construction oi the SEPED Gloze Exercises was uxidertaken In ordei 
to test the efficacy of the nnjl tlpl e- choice close fomiat as a tneasure of 
literal coinprehGnstori and as a niaans of readily ImplemeTitlng , with repro- 
ducible test items, the concept oi donialn-referanced testing. At the oatsat, 
a plan was devised fcr the syst«tmtic samplini of reading imterlals In £our 
domains in which students «e e^cpacted or required to read. The do:nal«s arei 

1. TeKtual Matari.lL In Reading /LI teratutes Languag€ Aacts^ Social 
Studies, SciancB and Matherastlcsi 

2* Citif^en MatarlaL (mwapaperi and news magazines) | 

3* Consytnei toterial (catalogei advertising. Instructions, amd 
so f ortli) I and 

4. Reference Material (test Instructioiis , children's rnagaslnesn 
encycloped£as, and so forth). 

Selection of clozm pasiageS i Textual nwterials w-ere to ba samplad at 
each grade level, from 1 ttao^igh 10, Itoterials ixi the other dotnains w^ece 
to be assigned to grade Levels on the basis of readability scores, Qiiotas 
ware established for the nuinber of SOT.ples to be collected, at randogti, £o^' 
each grade and doinatn* The resources used for the sample collection wece 
the New York State Education Department Vs GurrlGUlwn Laboratbiy and the 
State Library, In addtcionj spn^ oonsUTner pasiages were taken froin 
-A Pilot Reading Llt^iracv Asses anent of ttedlgoii Public Sc hool. Btudeitts 
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(Hansen and Hesse, 1972). 

The selaction procedures resulted Iti the IdentlEication o£ 1|374 
passages that were coherent and of specifted lengths appTDprlate for cloEing. 
Their dlsttlbution by domain and grade level Is showa In Table 4*1* TaUe 
4*1 also shows the distribution of the tactual materials by subject matter* 

peteTi^lnatlQn of Readability . All of the paBsages--those in the 
textual domain as well as those in the citizen, consumer^ and reference 
domalTis-.-"wara subjected to readability calctilattons so that they could be 
ordered by difficulty for test construction purposes* The readability for* 
milas used were the Spache ( 195 3^ 1960) and the Dale-Ghali (1948).^ (km wted 
previously^ the cloze itseie has advantages over conventloMl formulas as 
a measure of readability* Howe-var, the Spache and DaU-Chall are widely 
used measures, and their utilit^r as rough Indices of difficulty is borne 
out by the results of the initial use of the qIqzb passages reported in 
Ghapttrs VIII and IX.) 

The Spache is normally used for grades 1 through 3^ the Dale-Chall 
for grades 4 through 12 and collegei Both forrnuUs use average sentencV-^ 
length and percent of ^^hard wcrds'* in calculating difficulty, "tord words" 
are those ^gt appearing on lists of familiar words* The ^ord list for the 
Spache fomula is ^'ClarenQe Stone^s Eevislon of the Dale List of 796 Easy 
Words'*? for the pale-Chall fomiula it is the ^'Dale List of S^OOO Familiar 
Words. C^he criteria for difficulty used in devising both formulas were 
graded reading niaterlals*) tte Spache formula producei grade level scores. 
The Dale-Chall formula producas raw scores Interpreted as '^corrected grade 
levels-'* ^*e corrected grade Level for a raw score of S# 0 to 5*9 on the 
Dale-Chall J for eKample, is fifth to sixth grade* 



^Anoth^r 120 passasea have been added to the cttl^en doniain and axpaii^ 
Ston of the tesctual doimiti into coUege levels is antlciputad - 
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Table kA 





Domln 




Grind 


Grade 


Textual . . 
























Citizen. 


Esference 






Reading 


kn|, Irts 


Math 


kit 


Soc. Sti 

— ^ 


total 










1 


,^ 








30 


18 








78 


2 


^1 








30 


11 








71 


3 


30 


20 


20 


20 


20 


uo 






10 


120 


k 


42 


20 


20 


20 


20 


122 






9 


131 


5 


36 


20 


20 


20 


20 


U5 






10 


129 


6 


33 


20 




20 


20 


U3 


6 


0 

0 


11 


IJ9 


7 


3Q 


20 


20 


20 


20 


110 


g 


g 

7 




136 


8 

9 


30 
30 


20 
20 


20 
20 


20 
20 


20 

20 


uo 

UO 


3 

16 


9 


10 
10 


131 


10 


34 


20 


20 


20 


20 


m 


20 


li 


lU 


m 


11 
12 
13 














20 
19 
14 


U 

13 
12 


A 

Q 

10 


ta 
^2 


14 














11 






n 


Total 

t 


354 


160 


160 


160 


220 


\ 1034 


120 


100 


100 





The range of Bpache scores was divided into sIk equal interval and 
tie range of Dala-Chall scores i^as divided into 22 equal intervals. This 
gave 28 difficulty levels covering grades 1 through college. The raw scores, 
difficulty levels^ and orlgXnal grade level interpretations given by Dale- 
Chall and Spache are shoma in Table 4*2* 

Use of the readability fonnulas disclosed wide ranges of difficulty 
ainong instructional msteTlals at given grade levels- Both eKtJremely easy 
aitd eKtrsaaly difficult passages that differed markedly from other materials 
for the saTne grade ware eliminated In the selectlDn process. However j 
there is still variation In the number of difficulty levels covered by 
grade levels In the textual doinaln. The grade level of the source was 
used as the guide In application of the clo^e procedurei Both grade level 
and difficulty level are Indicated by the identlflcatloTi number for each 
passage. 

Preparation_of cl^g^ iteinSi/^ The procedurt for wrd deletion in the 
clo^e passages varied with the grade of the sourcet In grade 1 and 2 
materials, every eighth word ^as deleted, and deletions were liinited to 
nouns and verbs. For grade 3 and above? every fifth wrd was deleted* 
Deletions Included adjectives and adverbs as well as nouns and verbs ^ 

In all caSeSj the initial deletion was made between the sixth and 
tenth words* The eKact starting point was determined by a table of random 
numbers* The nuinber of deletions per passage was fixed by the passage 
length, which varied by grade level* The nimber of alternatives In the 
tnultiple* choice responses also varied by grade level i three alternatives 



Only the briefest summary of the modified close procedure is given 
here. See Appendix A foi' a coinplete descriptloti of the passage selection 
and ltem-*writing procedures. 



103 

4-20 



TaDie ^.^ 
Difficulty Levels for Cloze Passages 









Orlgiiial ,^rade leval 


Readability 




Dif f iculty 


asslgninents by Spacht 


.'foffinuXe, 


' sccre 


level 


and Dale-Chall 


. S 




1 


t 


P 




d 




A 


2, O-^Z ,^ 


J 


2 


C 


2.3-2,9 


4 


B 


3.0-3*4 


5 


3 


E 


3,5-3.9 


6 




D 


^,50-4, 74 


7 


A 


A 


4. 75=5,99 


8 




L 


5. 00=5,24 


9 




£ 


5, 25-5,49 




S-6 


1 


5.50-5,74 


1 1 


C 


5,75-5,99 


12 




K 


6, 00=6, 24 


13 




A 


6, 25-6,49 


1 A 


7-8 


L 


6,50-6,74 


. 15 


L 


6,75-6,99 


16 




7,00-7.24 


17 






7,25-7,49 


18 


9-i'b ' 




7,50-7,74 


19 






7,75-7.99 


,20 






8 , UU- Q , 


21 






S, 25-8,49 


22 


11-12 




8, 50-8,74 


23 






8. 75-8,99 


24 






9. 00-9,24 


25 






9, 25-9,49 


26 


13^15 




9,50-9,74 


27 


(Coilige) : 




9,75-9,99 


28 





104 



4-21 



at grade li fouir at grades 2 and 3^ and five at |rades 4 and above These 
variations by grade level are sunTOarized in Table 4,3^ Specifications for 
Cloze Passages and Test Items, 

The correct mil tipLe- choice response to a cloae item is the enact 
word deleted £rom the passage. To assure distractors o£ appropriate 
difficulty foic the test items, graded lists of nounSt verbs, adjectives, 
and adverbs were prepared uilng Harris and Jacobson's Bg,s Ic _ElCTientary 
Reading Vocabtalarles (1972) and EDL Research and Inforinattpn BulletlTi Ss 
A. Revised Core Voeabulary (Taylor^ Frackenpohli and White , 1969)# Special 
ccatent words for subject matter areas like Social Studies were compiled 
using the Itorrls* Jacobson mLaterlal and the American He:ritage_ Word Frequgncy 
Book (Carroll^ Davie s, and Rlclu^n, 1971) # 

Inltiallys distractors were selected from appropriate lists by use of 
a table of random nutnbers- Latert a computer program was wi^itten for 
automatic random ielection of distractors* Each set of distractors was 
revlevred to eliminate tricky or ineffective distractors, such as synonyms 5 
axid to assure that the distractors agreed with the stem in tense, number, 
and so forth* 

With a minjjnum of 3 deletions per passage at grade 1 and a maKinium 
of 10 deletions per passage at grade 3 and above, nearly 15,000 multiple- 
choice items have been prepared for the SPPED CLo^e Exerclies. 

Format . 411 clo^e passages and test items ^ere put in a comparable 
foCTnat* The format gives (Ij the identification number of the pasiage, 
(2) a title (provided by the item writer), (3) the paisage itself, and 
(4) the test itmti&m Large (Bulletin) type was used for the first mo 
garades* A sample cloze pasiage for grade 2 Is shown in Figure 4*1, 
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Table 4.3 



Spaclflcatlons for Close Passages and Test Items 



Passage 
length 



Words 
deleted 



Fraque-ncy 
of dele- 
tions 



Grade 1 



25-35 
woifds 



Nouns 
Verbs 



Every 

ath 

word 



Grade 2 



40-45 
words 



Nouiis 



Grade 3 



60«70 
words 



Nouns 
Adjec- 
tives 
Verbs 
Adverbs 



Every 
5th 
w ord 



ind above 



wo^ds 



Nomas 
■tives 
A4verbs 



Ewry 
Sth 



DeLetlcns 
par pas- 
sage 



10 



to 



Alternatives 
par itain 
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WHAT D0E3 AiiOY SEE? 



■ 4ndy saw somsihlng at his bsdroom 
J He ran as fasi as he could io 

r Mother. 

'iisten, liother!'' said Andy. 

Mother said^ "PleasG 3 Andy. I 

havQ to get ready to _ 4 to work now. Mrs, 

Coats is 5 for ine.'' 



papar 
li. window 
c. apple 

oven 



i. wash 

b, tell 

G, buy 

1 fix 



a. peep = 
fly 

c. poirit 

d, wait 



G) 



think 
sing 
tip 
go 



a, finding 
roping 
v/aitlng 
racing 



Figure ^.1. Sample doBe passage and items. 



ERIC 



107 

^24 



/^PLICATION OF THE HaWlPLE-CllOICE CLOZE IN 
M2ASUR£M;WT and EVALUAriON 

The fnultiple- choice eloz^e Mtwials are one component of a testing 
systGin intendGd to offer to the educational conrnmnlty more useful and adapt- 
abla measures of reading comprehOTglon than are currently provided by 
standardised tests. This chapter "nill first briefly describe that tasting 
systcmi the Test Pavel opnient KofeQbMk^ (t:DN)* It will then point out various 
advantases and foatures of the mu I tlple-* choice cLoae niaterialsi and diseuss 
the utility of these materials tov a variety off evaluation and decision- 
making purposes* NeKt, the chapter will presGat the prlnciplea Involved in 
applying the multiple- choice clo»^e mtarlals in specific testing situations* 
The chapter will conclude with n doacription of the test assembly and ad- 
ministration procedures followed In the first eKperlmental application of 
the rnul tlple**choice clo^e passageg- and iterns* 

The Tes t^ Pevq lopment y ptabooK^ 
The initial chapter of thi$ s'*&port cited arigldlty of tommt as one 
of the major shortcomings of stawdardlsod raaddng tests. The Test DcveloprneTi g 
Notebook was originally conceived aS, a flescible Ces"t«constructlon resource 
that would provide school disuricti with large numbGrs of reading itcmSp 
identified by different skills oic objectives and difficulty, which could 
bo assembled tn different ways to ma©t dlffereat evaluation needs* The TDN 
was at first planned to include aeveval different formats which might 
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measuire unique aspects of compffihem.slon. ^0 ^at^ tkmXB a»s st^sbl© Ifcini 
pooLs for too of thase Itfttn EoOTflt&j tfcie i^itmpl^'-choice clo^a and the 
wh-Ztnalm idea ltains# ■ 

The nEultlpla^choice Qloge itfiitB poolj the d^ir^il^^snt of v^falch was 
described in detail In the pre-wlou^ chapter, coi^p^life^ of «ppro^itna4ely 
I5374 closed passages Cl»e*, gensr^Uy^ w^^d p»esag*is ^;lt:h t«n 

dalatlotii and accompanying mul tlpl^-chdt^ iteina^ C4cegot-d2ed CfjnijocarlL^) 
by ireadabtllty lavaL s d^temlned by^ Sp^che^ an<d a^U^^ChaH readablldty 
formulas* The ^h^/maln idea iteni pool cot»Bis^i 200 pa^sagaSj L j at asch 
of 20 raadabillty Imy&i whose UCTgth^ v^ry syic^eni^^ical ly by raajabill tjr 
level <e#g*, appi^c^ctaatcly 25 wrd^ at 31 ^pid Up to 320 ^ords at le^eL^ 

17-20)* Eaeh of these jas sages la aCcGmp^niecl up to f cur ittultljle— "cholc* 
main idea itami and \ip to eight muLtlpae^^cshoice ^li*4^tail tteifta moidalei 
after Bormuth* B C1970) ^h^itemSi Chi jEptrfflsts ofi thtt eb^^ and the wh— fflacariaLs 
are both generative p^roccdures fos^ preiati-nS ^urti^ers of paraLlelj mnul ■clplLai- 
choice Itenis. 

The concept of the TDN, which is cut^intly ^ "psper lank^ derives Eton 
the coniputerlzed and pajer-^based appro ^cH^^ ta t^st ^iien^ly formalized 
III such projects as the Sequoia Canapralieix^ive A^t^le^^n^^r^t Monitoi^lag <C^) 
pTOgram in Redwood City j Calif orntaii vlet^ ioiaie giOiOCO l^^ms have "been 
banked to support local test asiimfely. ttse o*g^t^i^%^lon c£ the tiov^— 
evetj has also beneEltad fT^nm aiv^cal y^^^^ o JE ^f^pitd.enc^ in deveLcping ~ 
aiid defining CAM in schools ixi New- YorR Sf^ate was learned f^om the 
Nev^ Yor^k eKperlenca about the prac^l^^l asp^^ts o£ making some of th^ 
nawsj concepts In evaluation work toroadly In ^tk^ti^^* <Sefetied ^0 tiere 
are several years of applying eompLex ev^I^uatioit dssJ-gnB^ luch ai longi** 
tudinal matrix sampL Ing , that are raow ^ouc^i^eJly uBiSd in s^chools as paxrl: of 
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loc^l paojects*'^ 1^ addlcio^i the TINp part*Gttlatfiy in the de slgii 
ip@cl£iaatMn§ fop the iflul tip la-choice eloss mat etMl^i dfa^^ Htr^^tj- 0 

{i9l0 Mdei ^0 domain^ «fere«ceri testlai* Ttii ^rg^ni^acio*! of th^ riN 
CQrtsidarabt^ topro\rei o-p till Ssqtioia pra Jest and stolLar if io£t trxita ad 
of stopLy ffdlticii and p«vtdittg ffcr accession of flt^s and ^^eUttd fl^cr- 
fnacioti, the Tbt^ is ^Ictoat^ly geM^atdva £^r both l*effi a:ifld csit pr^d^ctl^ciw 
*Tha^t J.S| tb^ ^Joze itm. fomat ajpeara t« bi sapsbL^ oOffWir^loti. to an. 
algotltlm ^i^ti cati be tiiad ta p rooess aity appitogirl4t© s^pie a^i teri- 
dl^couifS€ Inito it^^ An indeftiilta Tiimie:^ o£ ^Dh icsme e^rti the seeff ci^e j 
bi gewc-ate^i In addltdoa^ f dntdied dtejns in the ca^ b« a^^ss^^ed an^d 
otgatid^ed tTitc> an ixifln.dta ni^ber a! tests by a gto^esa chat InitiHEa^^s 
dt^ectly vtth the actual pTlntirtg and produetioa oC t^^t fa Mi* 

The TpH tharefcraj a« a^ttempt tc build a g^tiemllg^d teat assembly 
ra^u*Gs« Jt dl:s cvrratit stage of de^ilcpmanti Che tnult JpL^-^^lfeoi^je ^Isi^e 
conmposiarit o f tfce TDlj is the iticit linpoatant lapia* oi chle g^nt^all^id te.it 
ai^emfcty iou*ce#f tjie -wuLtlp U^DhcDicc ^lo^e jaiat^rfals p^mtt itLeastit^etit 
of literal ^oinjprehfemsioii acr^si the total ^ange at Incsrest^ t« a ^a^let7 
of ewliiatiLon c^ont^sts <e*gi| te its may asstoiBle^co ^is^as afdcp^t 
gt^desr^s ab lUc^y to compteiertd literally a basaL readej: cPt ^ high ^beoL 
iti^dafit^ I aJblldty tc cotnpreh©«d liciMlL^ cexts dn the eCTt^nt afeSi)* 
th^ foiLc^^ng ^eaticns descrtfca the pMpercle s tie multtjU— ^Ghodeie stog^ 
TiiacerdaLij an^ iQCiMnpa^iaB a^va.nta|a« tliaceofij ®hlch are ^mifcltnad tn^ 
Tat>U 5» 1 
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Tabile hi 



€Gt IviEy III item foBiat 



Chi foMat © f th© raixltiiple-&|iol.ce alosa 
^dh^rt^tecie a9«c®pb of am Ifc^ EoOTf at^ 
object iTO trmd giftcratli^s gracidnir^ f cr 
prnducing Ic t Ait tiffi m tmb ia^ed tmftm" 
mnc-ktioxt of a undv^rB^ of cent inci The 
^ppHaatL«n of thin ft£miphm h^ire a^Wi 
cha pi^bL«ms o£ stibj eQclv^ity ar^d ace stilt ^nc 
concami: bdt^ io tHiC e^ns^riictton pM^s^iu^is^ 
crtdai^i ^ich taw be&w elafaccirdBt ic^lly 
lev^lfed at tSiCi «£ rt^diwg «omprehft*Bl^n*' 

tUtinoB at f a^sagag havi bften ay^tMatdeall^ 

^is^ntdng t©U"watiA do^^lm^ wicts* 4 li-^ 
^ouitfs&« 'li^t segM^e^ti miltinaetely ta ::gii3'' 

^itlB iftid ptfrtpertAag oE r^ad^btlit^^i 
^oncamti *€t^is th^e wal^lmg t^adiing t«st 
tfCQ^as TO Ti di«s^ly u^if^l in r€^dl«g 

tjsicar5 amd ge^it'^li^tbl'^ meisijre o£ ra iding 
tfompceJhaiiSloin, apjpwptiatelir termed •Jllte^il 

Che fs^dtiig bgtia\rla^ intf it ag f tG^tad by t3i^ 

Cio^i ja&Bagfti ac^e co toe caLib^it^d an 
^qu^l-int ii^al is al& thiB tepce satiris a 

tfeaJtng pio^idiJteB in castB of re^dl^ig 
^omptaJin-ston* P aesag^a in the test wHl 

itaiits* .An^^ teic ^mBmrnhtmi frofln che It am 
fsooJ ^dlL be referetactd t« tHe singla 
^caJi ^htch In tx^tn ea^ b« ^alated m^arming- 
iuUy *o €bjfi&ttl\ra pirfloepatmoe eritaila;! 

jiigon^ia to the taec sitis4tton be dejarmdttit 
^po^ a*t:i*al%y a^ea^dimi tihe t&it passaje^ i& 
cast by tb-S ^atw©^ o£ ttea tasTc, SOTiipla- 

^i^g d^el^tl^ns* '^hmt% a^« cmo questl^n^ 



Table 5.1 ( Continued) 



Property Advantage 

Automated generaticn of Itans The obJestl\^, gensrat-iv& nature of both 
and tests the ItM a«d teet format tnakes poaslbla thm 

automatloo of 1c;to construQtioti and test 
aisaably and print lng» Thli matees for 
both speed' and eeonontyp^\dilch in turn wltl - 
pemtc the use of more compl€3£ but more 
uieCul evaluation designs In the schocts* 

FlaKlhlft resource The al(^m passages are part q£ a flexible 

test davaLcpment resource called the Test 
Developmetit Notebook CTDfj). Instead of 
providtng a set of flKed teats, the TDH 
offeri a cclLectlon of fflaterlals allowing 
rapid and economieal asiOTibly of laige 
nuffibers of special purpose t€sts to fit a 
variety of evaluation need s* This format 
mitigates the problm of TOalmtalnlng test 
security tift Large scale pollcy^orleTited 
evaluation studies of reading aiid coiitrlbutes 
to ecotiomy Iti test ass^bly. 



Qb^eatlve Item Focmat 

The objectivity or reproducibility of test construatlon proGedures has 
beeii a major crltleisca of tio to- referenced Treasures of reading aoiDprehensieai 
Nom*ref eremed measures usually Laak an si^lielt theory of coDap«he«sion 
and objectivity ot reproducibility* The development of the ittulclple-Gholce 
alomm format in the TEN has Involved a constant and, to date, largely 
successful effort to ±iiprove and maltitaln ^bjectlvityp Objectivity here, as 
noted, is import ant becausfip given other conditions,- ^t erdianoes the 
possibility of repeatedly generating a test that Incorporates an unbiased 
sampling of the content and behavior g of tte univerie of interest. The 
presifflied uTOJiased natnre of the test Is^ EiirtheHnorep traceable* Others 
interested in the ope^cations defined by the test may generate similar or 
comparable tests, or the test may be generalised to other relevant behavioral 
domains in the course of extending or studying the underlying aonstniet. 
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Effective solytiens to the problffln of otajectdvity In the genemtlon 
©f domaiti- referenced Cest items are offered in separate raodels by Hlwly 
at al, , (1973) and Botmath <i970)t The generative itOT fomat repreiented 
by the multlpla-^cholce cl©ge loOTat used here is am application of Hlwiy's 
coxiGept of an Item f^m iri the dOTain- referenced testing models (a) tie 
TO\iit±ple-*eholce Qlom format c&nstltutes the flKed standard structu^ra 
lArhlch contains one Or mare vartable elOTenti^ and (Ij) the various uncLoaed 
paisages and the dlsttaGtor lists available for i%ma and test aonitwe^tlon 
are the raplacOTent aeta for those elraents* 

At latest stud^jr the miiltlple-'ehoice elpge fomat ieCTis to Qffer the 
potential of being almost ^olLy objectively reproduaible. Several inodl- 
f Icatlons in proaedure and fonaat now under conglderation vUl reduce 
potential blasai In passage selection that may have resulted from eartlerj 
unnecessarily rigid Itoitations on passage length | the use of titles on 
passages I and possibly insuffiolent unmutllated cctitert at the ieglnnLngs 
and ends of pas sage s» 

The ourrent ruLe-based procedure for c^nv&rstosr.-^T passages to the 
ttultlple-^cholea eloae fomat lams to offer additional potential for 
ccOTputerlsatlon, thus further approaching the possibility of reducing the 
It^ fojan to an algorltto*. jThls computerisation wuld prestMiably itrlng 
tDgather separate progran-i for raadabillty analysis, conveisloft of passagaa^ 
to the mutilated fottiatp and the generation and asslgnaaemt of dlstracCoKS 
from the word lists*^ 

As the objectivity of the multiple-choice cIom Itra conatruction and 
test assmbly procedures of the TDN is further toprovedj some minor 
modifications of foMat vrlll undoubtedly result. The current set of Item 
analysis data Is es^ectad to Goncrlbute substantiatly to detemining such 
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mod if 1g at Ion Si Esqpariemce with asssnbling atid using tie test to date 
has also ^owi Ciee Chaptat VIll) m tnmber of areas i^^ca objectivity aati 
now be ex^iancedii 
Dcnuaitip*^a£eren€ed CQntent 

lha passages for the inyiltiplft-chDloe cloEe 0ompoaent of the TDN wete 
systetnatically drawn from cLtarly dif Ined and relevant domalni of written 
disesiirsei Th^ selection of «l&vant domains was aided by Teference to the 
Hansem and Hesse effort in Madisonp ^^iocorisln tl972) t^ 

enead test basert on the standard cLoae« In that efforr parents and taacheys 
IdQiicif lad relevant domains o£ -wittaTi dliaourae on Che baslB of frequenQy 
of use or impoirtanee in tha seheol and eomaunlty £ot students In grades 4 
through 12» Tie preiant efEort teproTOS on the relevMsa ar^d specificity? 
of these domains by sKtendlng the domalna and levieli to gEada 1 and by 
InaeEpo rating readability as an additional definitig eh acact eristic* 

The domain- ref ereneed model is Intended to support generalization imam 
test scoree to relevant domains of appli^ation» XheotfetlQal and ©nplriaal 
clarification cf the oome^ft of llteMl ccmpreheniloni by aontrast> la 
potenttaliy Indicative of default s in the pffocesies mnderlying literal 
comprehension. The ability to specify both the process of comprahension 
and the clrciim stances of Its es^resslon (l»e»f the classes and levels of 
written discouTSe involved) In a test constltutea the basis for using 
comprehension test scores In deelslQn-€iiiaclng in reading Insttruction, 

Using the domain* referenced aodel as a basli fo£ assaitling the variety 
of passages for use In the fDN is a dallberate atttnpt to mm^imizm the 
relatlom^lp between the test Bittiatlon and pTogtm. content. Progran 
content includes relevant skills and materials Involved in reading sltuatloni 
in the SGhool and conmunity* Maxtei^lng this relationship should, In ttim^ 
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eiiable Bmmm t© ptoduce tests of jreading apprehension that are TOaxam ally 
iansltive fcg mmM of the most liQportatiC outaoines of m&Atn$ isimti^ctloru 
SnoU tests wuld. be deslgmd so that m given leval 0^3^ fciiit fom is 
suitable to the readttig abilities of a givea student population aad so that 
the contftrit of the test is relevant t« and refleots the ohawgiiig aatuie of 
the reading es^&rtem^s of that gEoup over time. A iairvey te*&$t tot first 
graders^ for i^cmple^ would contain a raxige of passage cllffilaulty that 
would fefiXsot the raaga of TOltten dticourse televant to flVit giadera in 
the aehool and aoimualty, k set of testa aseerahled £o« ittm gradeEs 
acaordlng to this p^l^cipla wuldp at one point in timm^ thwretiaatly 
generate the distrlbiitioti of mean pas^a^ge scoree depleted im Figure 5,1* ^ . 




1 2 3 4 5 6 

passages, by Order of Difficulty 

figure 5.1* ideal dtstrlbutdoTi of ^axi passage ^WMs on 
survey test itt leaatng given at beglTOitig 
of gtade 1* 
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Consider that the test contains Six multiple-choice close passagfes o^decea by 
difficulty or readability le^el. At the beglnnliig of grade 1, as mhcwn Im 
Flguxe 5,1, the mean score oii passage 1 is leas thaa 10%i and It drops still 
lower on the other passages. This would he the expected perfornanca oi nwat 
first graders ^Ith relevant leadlBg passages In a September test adtad^nistra* 
tioni noost of them would not be able to apprehend the literal me anttiE #G 
even the simple 25-word paosageg it readability level 1- 

Flgure 5t2 repeats the Information of Table. 5.1 and further demonstrates 
Ideal or saxpected passage scores at later points in time for a test that 
aocuxately reflects eKperlence with reading materials. According to this 
illustration, by the middle of grade 1 the meari passage scors at readabUlty 
level 2 is , about 75%s whereas in Septefliber it was near ^ero., By the and cf 
grade 1^ the meati paisage score; at readabllltyjlevel 3 stmda^near 7S14* 
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Figure 5*2, Ideal mean passage scotes for eurv^ey test in 
reading given at beginning, middle , and end 
of grade 1- 
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An appropriately designed test o£ literal COTpreha^sion will thus 
generate a series of dlstrlbiiticns over tine xdileh ^111 reflect the 
imreasing ability o£ a glTOti student population to apprehend the literal 
meaning of increasingly diff icult passages and more diverse domains. The 
nature of these perSomanee distributions is a funetioa of the time between 
test administrations^, the degree of relevawe of the eomtent and the dlffleulty 
lewis of the passages selected for the test^ the ability of the students, 
atid the degree of learning that occurs In the time fraaewbrk of the test* 
The ability of euoh a test to meaiure certain broad effects of reading 
Instruction and as^erience li theoretically mascinlsed ^^hen the test Is 
designed to generate the distrJiutXon shown in Pignre 5«2 for each level of 
a given instructional tystao* 

The ability to generalise from a score dlstrthiatlon l^e that shomi 
tn Figure 5# 2 to one or more content dOTalns lo dependent on specifying the 
readability and content characteristics of both the test pasiages and the 
dcraains from iwhlch they vrnm drawi. The description in Chapter IV of the 
dealgn and assenbly of the multiple-choice clo^e ecmponent of the TDN 
showed that specification of the rnadabillty and content characteristics is 
^ell underway in the current developnent effort.. Each of the- clo^e passages 
In the current set of testing materials has a Dale-Chall or Spache value 
identified by a two digit :nimbertand considerable progress has been made in 
specifying the distribution o£ readability levels In each content domain 
In the TDN. What rOTslns to be Learned is the passage perfomance criteria 
to be applied to passages at a given level of reading development for the 
student population (e,g,5Does 80 percent correct on a passage signify the 
passing criterion to be appLled to a given population of students and passages?) 

^Some studies (e.g.s loraiith, 1971) suggest that the efficiency level 
of readers improves ^Ith age and e^qjerlence, thum suggesting the developnent 
of age-graded perfomance cElterla* ^ j'?^ 
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Uni4 Mm miQ nal Ifcy 

the iauitlpie*chotce close conititutes a procedure for measuring a 

typi o«r level readiiig comprehension that has its basis in cogtiitive 
thsQ^ und in geiieratlve theories of language (Smith, 197S). The test or 
item jfoTOat is an attOTpt to measure canprehetision at one level of cognitioni 
i*e»p th^ literal meaning of a written me i sage* This is distincC from such 
other typBB oE isofinprehension as inductioni deduction, or evaluatioii, ^ere 
the 3f0ader is retired to go beyond the literal meanings 

The design of the test fomat is basically coherent wi^ the o^oing 
act of coinprehenslon^ ^ich Is charact eristic ally pap id and foeused on tha 
processing of relatlTOly large info mat lo nal units. The stTOCture of the 
passag^$ in a glv^en teat is unchanged by the multlpla«choloe clo^e format* 
The met o£ resporidlng to the word cholees ^mains fooueed on the meaning 
of th# paaaage, InteTCTpted only by periodic deletlona. In this act 5 the 
read^t pwdicts the tords that correctly replace these deletions to collate 
the iwfcanded maaning by drawing upon his own relevant cognitive structure a^j 
unlqufa ate othaTOtiep and not, as may be the oass -with other fonnatSj those 
in th# mind o£ a testi«inaker» 

tl^B baha^jior measured by the multiple-choice cloze format, termed 
llte^cal s:^oinpraheasio% Is one of a ^all manher of hypothetical factors ^ich. 
account tor how a reader ol^t process different types of written materials 
under different circiitt stances* Literal COTQprehenslon is probably that 
cmaponent of coapreheneion that Is most heavily affected by Inst wet Ion and 
expertenca durixig and beyond the period normally given to the f oraiaL reading 
progriim«« The focus on a measure of literal co^i^rehenslon, . therefore^ pronaiie 
to MSUlt In a tnore sensitive methodology for testing the actual or achlevrtl 
outcOT'©^ of readtng Instruction across much of the term of public education* 
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Farthemoffe, the oifganlgafcion of the present to A affosds a uniqu© 
Opportunity for studying the meaning and develjopmeiiS o£ 

iton In a reipona© contejcc tiiat is eseantially oonstamt aarQSS age and grade. 

^hi mtiltipla«ehoict cloge fomat offers baiically tha Pme itimului to 

dlffsrant agt«g3?adad re^pndentSp the only difference s lylTig in the 

COTpLexity ov difficulty of the paisagas usdd at different levels. The 

lasch mtasur^tnt model, moreoverp is eurrently heltig expertoentally applied 

to Qross-seatipital data on the test to pafovlde aasuranee of unldlmansionality 

This may be one of the first reading tests ^ere dimensionality was studied 

at the level of the ttmi fomtt in a develo^antal couCsKt. Gertainly, 

eiciatlng tests of comprehension do mt provide t^e asgaranee that apparently 

similar types of raULltlple'^oboice questions measure the same faotors across 

tasc levels* 

IcyAal^Intejcval Soaling 

I'he application of the Rasoh ffieasuranent model to the total pool of 

2 

cLozed passaga$0i using the technology provided by ¥ri^t and Mead, will 
TOiutt ln the calibration of all paasagei on a single soale ^th equal- 
iriterval propartleSs Because of the way It was asseiAledp It will be 
possible to flK this scale In relation to meaningful upper and lower lialts 
that iflll gim thm scale the properties needed for cost benefit analyses 
arid other important evaltiatlon pui^oses# The loyait level on the scale, 
for e5smplep will be flKed just above the easiest of passages available 
In pre-p rimers^ ^tle the upper level of the scale will be fixed to the 
more difficult passages smpled in twelfth grade and in Belectsd adult 
reading .materials^ The total scale of passage calibrations will thus cut 
aeross the total mnge of passage readability that Is relevant to the grade 

2 

parsonal eoOTUunlcation* , , _ 
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1*12 studant populatioru 

The ec^ai^lntewal scaLing of the ftist pasiages provides the possibility 
of deteming how much of the total reading curcicvlltmi in terms of literal 
comprehen.slon has been achieved at a pairticular point in time. For mmplmp 
i£ the TOadabtlity levels of basal readers and related tenets range mwons 
some 20 readability levels^ atri a given group has attained a tm$t 
score that represents master at the uppemost of these levels^ it 
tnay be conelttded that the levat of achl^VOTent with this domalti iB l'QO%» 
l£ it was achiaved in 9 instaad of 10 y&M&f then the level of eliEteienay 
-mi^t be given as 110%# 

The foregoing scaling properties Mo^ fo3e the potential devel^pmtot 
of a variety of meaningful scores that will appropriately tranifoOT thm 
base score of arty test aisanbled from the Itra pool by taking Into ao^^Dunt 
the raoiint of Instructional time aM th$ anount of content aehie^r^Ai^ Con- 
tent Is deftnad as the number of domains at ^ecUled readability Iw^^ls 
(e.g#* pcasmiably by 6th grade, which ii the half "point of aehool t5to% 
Bome speaif table proportion of each relevant domain of written dl SCOOT se 
should be achieved). The flneneis of the calibrations of passages in the 
TON which appllcatioTi of the Easch niodel tiill achieve^ moreover, ^il wake 
the item pool appropriate for the assawbly o£ tests with relatlveXjr Sttie 
or coarse calibrations as req^uired for different assessment putgoBmB^ The 
original Itan pool was assamblad using equivalent passages that warn 

\he sfflnpling of readability levels of basal readers and Vlmm^^m . 
teKti rarely resulted In materials abov© level 20^ while materi«li £rk fcha 
content areas tfere generally showi to be WQVm diff i^^lt at the upp^X .$md&. 
levels* Ofiei msy conceive of a derived ^core lAlch would e^qiresi Bm^ 
average of the pTOportions of each domain that ^ould be read by a gtmn 
Individual^ based on the dlftributlons of readability levels by mn^mm 
domain and grade or age* 
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calibrated (by Che use of ^readability ' fonnulas) by haif-g«ade readability 
ititervals. The Rasch calibratipn should yield a mUfth finer scale of paaiagife 
calibrations. 

The Dale-Chall scores of the passages at giv©p points on the pTOjeetia4 
Rasch scale will provide a partial basis for gane^aliaing a test score to 
a given universe of written disaourse* The addltiowl reJerents re qu trad 
are^ as noted^ Che masteiy perfomance criteria to applied to a given 
level of the population and distributioni of readiabillty scores for the 
drains of wlttien discourse relevant to a particular lew! of the iCudatit 
population* 
Passage Dependepay 

As noted in Chapter I* the validity of seveMi well-known testi of 
r&adirig compreheaslon has been seriously challenged an the issue of paisage 
dependency«-the tendancy for students to obtain iOoWi well above chaw a 
without reading the test passages* Some authoti haw pmposed that this 
issue be handled by redefining comprehension as infowTiatlon gain <Bo3muth|, 
I97O) or as a residual gain (Raritln and Dale^ 1969) rather than by making 
better tests. These procedures attempt to rOTOva £mm the test score the 
Influences of specific and gemral knowledge and wrtous other test- taking 
strategies that oparate on test questions independwb of the test passages* 

Though one might wish to define new learning iXk this way, l.e., as 
infomation gain, redefining CMttprehenslqn in iuah tmrms seMS to create 
new problfflns* One has only to consider for a to went the virtual Impossibility 
of distinguishing between that part of the test mom that representa pi^lor 
knowledge of the reader and that part which reprei^ents new learning. Kqv/^ 
does one ask a ^estion about a passage in the pmmm situation xdilch iom 
not involve the iiite^lay of old and new Inforoatlon or knowledge? 
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It doei HOC seem desi*iJ3l%,ti^a#iia^, or possible to attei^t to rCTiove 
the Influencei of speaificj gmtiai^^l;ji or tdio tyncratlc knowledge f^m Gompfs- 
henslon test scores* According the position taken here, all meaning 
eKliti in ths Mader emd reading m eomprahamiltig nieeisarlly InvolTOa 
bringing such meaning to bear oti awprejhendlng m paisage* The beet avallabl© 
solution to the problan of pasaag^ 4epaTiclenGy^ theraforei involves eliminating 
as muah as possible the influen^©^ addlttonal TOeanli^ or other irrelevant 
infonnatlonal cues offered by th& tmmt iituatiofl* Such Influencas are 
neeessarily present in the typlc^all multiple-choice questiona in tests of 
Gomprahension <e#g.p idlo^ncra^i^ meanings act introduced by the ways In 
^ioh the test*-writer inte^ret^ ^ givati passage) i but it would be difficult 
or tapoaslble to eliminate tiim wWhout also eeriouily affecting the 
measurement of eomprehenslati* 

The tnulfclple-^cholce clo^e fym^t i^pears tc avoid the issue of intro^ 
dueing into the test situation idiosyncratic meaning that both affects the 
test score and interferes with Itehi^ student 's att^pt to process independently 
the Inforoation in the test. Tbsq^^tically, this was accomplished by 
eltolnating emantlQally plausible Watractora from the word choices given 
for each deletion and by voiding ^Wai^ ai a baali for choosing the correct 
response* The effect of other <e»g*i itstraator length): on the test, 

score is another issue that will ha JKAtrfled by the dlstractor review process^ 
However^ only onplrlcal study will 4#tennlne the overall degree of success 
obtained In handling this problw^ 
Automated Generation of Item s simd^^jgjts 

Ai nofcedj the cloge componwt the TON ii mt a f isced test but will 



The foOTatlon of part-oC'^^p^^^Ji vord list iMn the - dlstractor generation 
process^ for escanple. Is one of mmtml strateglei deaigned to eliminate the 
possibility of selactlng the mic^mt yord without reading the passagep 
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be a barfc or collection of calibrated passageg Intended for various evalua- 
tion purposei. Also, the flexible notebook fomat is an effective device 
patterned in principle after the item bailing esqperlmants that have proven 
so successful in aipporting the economical assCTibly and maintanance of ffrests 
for cm in the Sequoia and Hopkins projeGts In California and Minnesota, 
and in various installations in New York State (cf. Gorth et al.p 1975). 

E^erlence in these projects shows that appTOxtaately half the cost 
of providing achlevOTent spores to students is in the development, design, 
asi«nblyt malntanancei and production of tests (Gorth at al., 1975)* In 
addition, the mor'i sophistlcatad and useful evaluation deiclgns are not evan 
eeonomlcally feasible unless some way is found for ^stanati^ing the gener- 
ation, maintenance, and production of tha required tests. For eKample, it 
is obvious that state and district level evaluation models can be greatly 
improved by eliminating the tenuously secure standardised achlev^ent battarles 
now used in favor of the multi-^atrlK sanpllng appTOaches which Incorporate 
a large array of test forms and also produce data that are mora broadly 
representative of a system* s goals. However, the lack of effective technical 
support for mounting such designs has most likely been tha major factor 
praventlng their jmplanentation at state and local levels. 

The design of the clo^a and other comppnants in the TDN anticipated 
the effective use of technology to support the development and maintenance 
of a given barfc of itmis and the assanbly and aconomical production of large 
nmnbars of flnl^ed test forms. Because of the fom taken by the oloze com- 
ponent of the TDN, it has also become feasible to partially automata the 
ItOT ganeratlon procedure* The discussion on objectivity or reproducibility 
Indicated that the processes of producing a closed passage ml^t be canputer 
progrmned once the passage was selected. 
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The technical support for storings reviewing, assanblingj, and printiiig 
multiple copies of passages and items In the cloze Gomponent of the TDN 
is based on the use of the Merjanthaler V-I*P CVariable Input Phototype- 
setter) Model 7245-3. This phototypesetter reads a paper^^punched tape 
^ich utilizes the standard TTS (teletypesettlng) 6-level code. The TTS 
6-level code enables the selection of 96 characters (alphanmaerics) and 
22 conmand codes specifying typesetting paraoaeters ( such as fixed and 
variable spacing) and machine control functions (such as shifts and line 
endings)* The Merganthaler "reads" the corranand codes and then exposes the 
selected characters in the comtnand fomat onto photo«fQechanical paper* From 
thiSj a printing plate Is produced for the rapid duplication of multiple 
copies* 

The process of converting the cloze format passages and itme to 
this paper-punched tape medltsn and inserting the prograiranlng instructions 
on layout for printing Is currently underway. This process interfaces 
directly with printing (eliminating conventional procedures for typing 
drafts), and also supports a syst^ of easy ^stotage and editing. The 
1^374 cloze passages and Items are being stored on approKimately 350 tapes* 
After evaluatldn and field testlt^p individual Itms can be edited^ and 
passages altered or replaced through a video«correctlon terminal in the 
State Education Department. 

This procedure also provides an economical means of genaratlng various 
test forms. Dep erring on field -te stingy and eventually user neids^ test forms 
can be generated through the selection of the appropriate tapes from the 
baric* 

Figure 3.3 Is a copy of one multiple-choice cloze passage produced 
on the Merganthaler photo t^e setter* The copy is in Galedonla-Bold 
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01*02^01-0^0^045 



GOLNG FISHIiNG 



had an old boat, Ben had t%vo fish 
______ , He gave one to Sam. "Shall 

we . ?" asked Ben. 

O 1. fish O 1. letters O 1. cage 

2. come 2. poles 2, picture 

3. step 3. airplanes 3. fish 

Figure 5»3» A eopy of a eloga passaga as produoad by 
a phototypesettiDg maGhina 
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typeface, l4«point size* The mark-up sheet for this passage ii shown 
in Figure 5«4# 

fha foragoing process is the core of a test assembly procedure that 
is now currently operational and affective for the problOT it addresses^ 
It will ultJmately be Integrated with the computer to further Improve speed 
and econwny and also to Interface test production with the process of 
analyzing response data.^ The computer will store Infomatlon on all 
passage characteristics on a disc or tape file and will provide a program 
for the selection of passages on the basis of several staultaneous criteria* 
PresuEnably^ these criteria will include the range of calibrations deslrad 
In the test, the nisnber of passage the content areas to be smplad, and 
the grade leval(s) of the student population* 

Once the content of the testCs) hag been specified^ the program will 
be capable of generating data decks that Identify the characteristics of the 
test and determine how it is to be scored, A test generated by such a system 
will be jpTOvided in the required nmnber of copies and" wilt ba scored andl 
processed for reporting pui^oses* It is eKpectad that tha development 
work on the closa component of the TON will ba carried to this point* 



Currently^ the tapes can be oonvenlently filed and used for the 
assaibly of large nimbers of tests without the aid of a canputer* The 
entire set of 350 tapes is being reviewed and edited prior to the produc- 
tion of 1^000 copies of the item bank* These are to be used in local 
assmbly of test foms by constructing test form masters directly from 
hard copy* 

%ther file §5 such as the student files^ will need to be set up bafora 
the test is scored and a report produced* Howeveri the anticipated produc- 
tion of the iten scoring file, along with the test, will contrtoute to the 
speed and economy with which a given evaluation design can be mounted based 

on the production of a set of unique teats* 
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©quoi Sam and Ben went to 
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Sam 



had an old boat. Ben had two fish 

< ^, He gave one to Sam. "Shall 

?" asked Ben. ®® 



we 
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©3. s airplanes © ©3.s fish©®^ ©q* 



Figure 5»4. EKanpla of a cloze passage prepared for kaypunQhlng on a 
taletypegettlng maGhine ehowlng textual copy and type- 
setting eonnLands* 
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Flexible Reeougca 

Perhaps the principal advantage of the multiple-cholGe clo^e materlalsi 
in the contesct of the TDN, Is fleKibility of application* The anticipated 
range of application of the cloze component of the TDN is defined in tems 
of three levels of evaluation identified in colmnn 1 In Table 5#2* They 
are surv^iy testings achlevment monltorlngp and diagnostic or tailored 
testing* The key deGlslon*makers at each level of evaluation are given In 
coliron 2m The tine f rrae of test administration in a level of evaluation 
Is rfiown in the thlrf column* The fourth column gives some brief examples 
of the purpose of the test atolnist ration^ and the final column ^ows eKamples 
of the t^es of decisions that each group mi^t make, given the kinds of 
data that result from a type of testing* In practice, no one level of test 
infomatlon is used exclusively by any one decisionmaking groupp Rather, 
infomation from testing becomes progressively leas useful as It Is more 
r^oved from Its Intended prtoary reference group* 

The level of testing that Is undoubtedly most fmiillar to most educa* 
tional decislon-m^lng groups, professional and client alikei la the survey 
test usually associated with the widespread annual aAnlnlstratlon of standard- 
ized achlevCTient tests* The eKample of aurvey testing given later In this 
chapter carries the sme Intent; to assess the status and development of 
the student population In tetms of major educational outcomes and domains 
of application— in this cases literal comprehension as applied to specified 
categories of written materials* 

Survey testing* i. The survey-testing design ualng the doge passages pre- 
sents each student with a satnple of passages, such that a broad range of rel- 
evant written discourse is tested against specified populations in a school 
or district. The data reiultlng from such a design will provide estimates of 
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Table S2 

Ringe of Application of the Multiple-Choice Gloie Materiile 



Level ©f 
evaluat tqn 

Survey tea ting 



Key 

dgg 1 iloii-*in ake rs 
Adniinlstrators 



Time ftmQ 

Annual or 
bi-annual 



Purpoii 

Aisiss compreheniion 
in a range of levili 
and doniains across 
student population. 



Decisioni 

Allocate wsourGesi 
dstemlna effect Iwness 
of reading progrOTC s) ^ 
(by di strict, buildings^ 
level Ss or otheir:unita) 
over a4ong-tem period* 



Achievement 
monilioring 



Principals J tiachers 
and students 



ftriodlcally 
(e,go every 
5-10 weeks) 



Aasees growth of 
comprehension within 
a level and domaiTi* 



Allocate Instructional time 
and effort continuously 
throughout a couree; deter* 
mine student progress | se^ 
lect materlale for a course 
or student; assign students 
to a level in the system* 



Diagnostic or Teachers, studencs As needed Assess comprahenslon Dttertnlne a student's level 
uiagnosciQ o_ - Itvel at one time on of reading' find materials 
tailored testing ^ sklll-by-ikiU suited to a studentls read- 
basis, inp level* 
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the status of literal comprehension in relevant domains of written discourse 
by grade levelf building, or attendanGe a^ea. For exaaplep the results could 
show that, across a ssnpling of 25 different basal reader syst^nsp 20 percent 
of first graders scored 90 percent or batter the highest level of passage 
difficulty in June* At higher levels in the ed?iaational ^stemj the same 
type of perfomance estunates could be shown for a bTOader set of relevant 
materials. For eKmple^ the survey test for middle* school students would 
likely sample across readingp language artSj sciencep social studies^ ref- 
erence material Sp consimer material and so on* Such a design might thus 
mak^ use of the principles of multi^atrix sanpling by sampling the various 
item domains available to obtain the broadest possible representation of 
CQ&tant on the test* 

Survey testing is not necessarily useful at the Individual levels partic- 
ularly when matrix smpling Is Involved^ since any one student may receive a 
test composed of only a narrow sanple of passage content and reading levels* 
Survey data are primarily used to generate group perfonnance esttoates aggre- 
gated to a particular level of Interest^ e. g*j all fourth graders In a given 
building* Survey data based on the TDN multiple-choice clo^e materials will 
thus enable administrators and progrm managers priinarlly to eKfflnlne and 
follow the development of the total reading progcan fran year to year* The 
associated decislon**maklng will typically be broad and long -t em* The 
district aininlstrators will generally use the test results to Identify 
needs in teBnas of student groups and problans with areas of written discourse* 
They will use the data to eKamlne the daveloprnent of literal comprehension 
over several years* Theyiinay ultiinately begin to adjust the body of textual 
and other written materials to fit the reading needs of the school 
population* All of these and other decisions will be largely based on a 
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new type of nom made possible by the stiracture of the testing materials* 
This is a nom that is referenced to a given categoty and level of written 
disaourse that is axanplifled by the statMent: 7i jeggent of ae'rerith 
graders achieved Uteyal coroprehensiQn scores of 90 peg Gent or better csn a 
san plinp of editorials from major newspapersa 

Achievament Jmoriitoring a The next level of avaluation referred to In 
Table 5#2 is achleVOTent monito rings, Achle^raaent monitoring is a newomer 
to the practical context of classTOom evaluatton* Developed as a standard 
deiign in the pTOject by Gorth et al# (1975), the basis elments of achieve- 
ment monitoring are a set of parallel test foms and a longitudinal test 
schedule in which the tests are repeatedly aAainlsfcered without duplication 
to each student in a progrma. The foms are randomly atoinistered at fixed 
Intervalsi e»g»f with five foms and a bi-iMJ^iLly test schedulep a given sttt« 
dent .mi^t receive tests in the orderi 1^ 5^ 3^ 4, 2a 

If the design of each test fottn included cloae passages with aquivalant 
ranges of readability levels selected fran the each test aAninistratlon 

would yield an estimate of a student*© level of literal OOTiprehenslon based 
on the sane standard* For escaaple, each of five test forms could samaple 
readability levels 1-5 for a class of first graderi& The resultant data at 
each interval would yield individual and group parfomance. esttaiates at each 
readability leveli e#g»j in Septgnber 90 pero ent of t he clas s ac hieved 90 per^ 
cent or better at level l^ 80 peiccent or better at level 2 m « # and 20 
percent or better at level 5* The teacher using such data would be 
periodically looking for e^^ected increases in llteiral coi^rehenalon at 
hi^er levels of readability as the course of reading Instruction unfolded. 

Diagnostic or tailored testing s The third level of testl^ referred to 
in Table 5#2 would use paisages selected from the close component of the TDN 
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to generate a test tailored for one-^time ox repaated testiiig of a single 
studarit# the clgsa matsrials are ideally suited to the rapid assanbly of 
such tescs far the purpose of dataanifiing the l^wl of the materials a 

^student is able to COT^reWend liteEfliiyj m^'gmp^ texts in the variotis concent 
areaSi newspapaicsp news magasineSj conmflner materials, etc. This teiting 
would ba useful T^ienever a new student entered the ^hool and his level of 
reading ability was unknown* The resultant test scores from the cloze 
would indicate the levels of reading materials in each content area that 
were appropriate to the student's ability in the Initruc^ional and 

independent reading context s* 

Application of the Cloze Testins Materials in Practice 

Thus far only one of tiim major evaluafcion purposes of the close 

rnaterials«-the survey-'*hss been explored. Mi eKperimental application of 

the survey was aAninlstared in late Jtoy and early June l97Sf to 5^000 

grade i**9 students in a school district in upstate New York, The remainder 

of this chapter presents the basic principles for the developnient of a 

survey design and a detailed description of the development and Implementa* 

tion of this first survey design. The design of this survey test was 

develGped prior to any calibration of the cloze passages^ The purpose of 

this suifvey tesr administration was to collect item and passage data which 

would provide a basis for a validity study s deteKnine the adequacy of the 

cloze format? and Investigate the utility of readability formulas for test 

assembly* 

Design ,.pyin61ples 

In ^^Sampling Plans for Domain- Referenced Tests'! Millman (1975) presents 
thg basic principles Involt/ed in assembling domain-referenced tests for 
varying evaluation purposes* Figure 5. 5, adapted from Mil Iman* s article, 
illustrates these principles. Each cell in the figure represents the con- 
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t 5.5. The dimensions of test deai^ for the 
cLoze segment of the TDN. 
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vergence of a particular student, a pasgage or set of items, and a dotnain 
o£ written discourse. Tests required for given evaluation purposes are 
assembled through a saffipling plan which operates on all three of these 
dimetisions in the frfflnaworfe of tiMe. The evaluator's problem is Uo select 
students and passages in domains relevant to the specific information needed. 

For SKaTOplep say a reading teacher needed a test to deten^ne the 
appropriate level of reading text for a new slKth^grade student. Here the 
student dimension would involve only one studenti one category of textual 
rnaterials (i.e<s reading/literature — not the entire textual domain) would 
represent the content sampling unitf and the passage domain would be a set 
drawn from the 353 available passages in this subcategory in the TDN# 

The teacher might begin by randomly drawing a parage f^om each of 
the readability strata represented by difficulty levels 8«12 shown in 
Table 5*3. These passages would be used to compose a 50-item test which 
vould probably encompass the student's reading level* The student's score 
on this test would be an estteiate of his reading ability across all read- 
ability levels in the reading/literature subcategory* Assurance that this 
score is an accurate esttoate of the student's ability to comprehend written 
materials in this category could be increased by drawing additional passages 
from the matrix in Table 5*3* Such passages would represent a narrower 
range of readability than the passages on the initial test, and the process 
could be repeated several times before the passages In a given cell would 
be eKhausted*^ 



7 

At present, only readability infomatlon based on Spache and Dale- 
Ghall fownulas is availabla as a guide for passage ielectloi^ Eventually, 
passage selection will be based on the Rasch calibration of the total 
passage pools The scale thus calibrated will still be referenced to read* 
abilicy scores for the s^e of providing a technical basis for selecting 
texts and other written materials for a range of comprehension scorei. 

Every scale score or range of scale scores on the total test will be 
associated with a set of "equivalent" passages accompanied by readability 
score means and standard errors* These data are the primary bases for 
estimating the difficulty levels of textual or other written materials 
^ich a given student or group can comprehend with a certain degree of 
confidence. 
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Table 5.3 



Multiple-Choice Cloze Passages in the Reading /Literature Textual Domain 



drade 


Readability 


Graj 


le level of passage Source 




1 - 

1 Totals 

1 V ■ - 


level 


1 evei 


1 


2 


'3 


4 


5 


6 


7 


8 


9 




1 


1 


11 




















11 


1 


2 


30 




















30 


2 


3 


7 


14 


















21 


2 


4 




26 


















26 


3 


5 






12 


i 














13 


3 


6 






12 


4 














16 


4 


7 






6 


13 


7 












26 


4 


8 








9 


4 












13 


5 


9 








11 


11 


6 










28 


5 


10 








4 


6 


4 










14 


6 


11 










6 


11 


2 








19 


6 


12 










2 


5 


3 








10 


7 


13 
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11 


2 






19 


7 


14 
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5 


1 
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15 
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10 


4 




19 
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16. 
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4 




16 


9 


17 
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5 


16 
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18 
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19 


10 


19 
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4 
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10 


20 
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12 


, 11 


21 
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11 


22 
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12 


23 
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14. 


27 
























14 


28 
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48 


: 40 
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30 


42 


36 


33 


30 


30 


30 


34 


353 
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The foregoing sampling design for assembling a cloze test based on • 
passages from the TDN for an individual student Is iUuotrated in Figure 
5.6(a). This diagram depicts one itudent being administered a stratified 
random sample of passages from one content domain or category in the TDN. 
The strata, indicated by a row of small boxes, are the readability levels 
shown previously in Table 3.3. Figure 5. 6(b) lUuatrates nearly the same 
plan as 5,6(a), but here a group of students randomly drawn from the unit 
of Interest (e.g., classroom, grade-level, etc.) Is given the same test 
form. Readability strata are now Indlcated-by horizontal lines In the 
small box. Figures 5, 6(a) and (b) represent the stoplest sampling designs 
that might ije drawn for the cloze sepnent of the TDN. 
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10 




ii 




0 



m 

3 



faieages 
(b) Group of Students 



Passages 
(a) Individual student 

Figure 5#6. Staple sampling daslgni for the cloze segment of 
the TDN, 



The more complaK matrix ^onpllng designs that are possible with the 
alo^e component of the TDN are illustrated in Figures 5.7(a) and (b)« 
Figure 5»7Ca) generallaes the one-teat-fom/one-group plan in Figure 5* 6(b) 
to a matrix .kmpling design in ^Ich non-overlapping randaoa smples of 
passages are asambled into parallel test formSp and aach test fom is 
aininiatered to a different f randomly constituted group of students* On 
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both sai^ling dlmenslonS| random stratified saiipling la uaade Each parallel 
test £om sanples iTom the same readability strata^ and test foms are 
assigned randomly to students within ability strata in each experimental 
unit of lnterest«-elassroom, grade level, and so on* 

The sampling design presented in Figure 5.7(a) is the standard GAM 
design* It Is technically defined as a longitudinals mult i«imat rise smpllng 
design* Application of the standard CAM design would involve assernbly of 
several sets of parallel test foms to covsr the range of fi^lllty in the 
groups of interest* For exanplep evaluation of growth in c^prehension in 
sikthp-grade classrooms might require three to five sets of test forms^ each 
sat enGompassing a reitricted range of raadability Ce«gsi set one might 
have a range of 6-*10| set two ml^t have a range of 11-«15, etc*)* A 
student assigned to a set of such test foms would receive them at fixed 
Intervals throughout the reading prcgrOTs The differences in domain scores 
at each data point miuld provide a basis for estimating growth or develop™ 
ment in literal coraprehansion* 

Design 5.7(b) Illustrates the sampling scheme for survey testing. 

This design may be viewed as a one-time application of the standard 

design for each content dcsmain of interest* A survey testing system might 

thus include a serias of parallel test foms for all of the four djmansions 

or nine content categories in the TDN^ with each set of test forms actalnl- 
stered to a different randomly constituted population of students. The 

design could be further varied to yield different levels of Irforraation 

on a population, depending on the content domain or category. For eKfflnples 

each individual In the unit of interest (e.g., a grade level In a district) 

might receive one of several test forms in the first category (raading/lit- 

erature) in the textual domain. This would yield an estimate of each 

Individual's danaln sGore for that category. Tharaafterp in a second testing 

5-30* 

138u 



session, each individual might receive one additional test form in one of 
the remining domains or categoriei* In this way, a survey design could be 
mounted to yield both individual and group data as required. 

There are a number of complex considerations involved in developing 
sampling plans for a survey based on the TDN* A plan actually consists of 
several sampling designs, each developed for a given level of the population 
(e*g#. Level 1, grades 1«3| Level 11^ grades 4-6| Level IIlp grades 7-9)s 
The passage readability levels of each survey level should overlap. There 
may be a relatively large number of test forms, depending on the number of 
content areas surveyed, and the test administration ichedule will probably 
require a computer to effectively accomplish the assignment of test forms 
to populations* 



t2 



/ 



m 
u 
d 



passages 
(a) Standard QAM design 



Passages 
(b) Survey tesTing ^design 



Figure 5#7. Mul ti«*matrix sampling designs for the cloze segment, 
of the TDNp 

An Applied Survey Design 

The experimental survey design based on the multiple- choice cloze seg«* 
ment of the TDN applied the model Illustrated in Figure 5*7 (a) to the . 
reading/literature category in the textual domain* The initial purpose of 
this effort was to obtain item analysis data on a large se^ent of a irajor 
content area in the TDN* Subsequentlyp the tosch analysis program 
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supplied by Wrl^t and Mead (1975) was e^^erimentalLy applied in the analysis 

of itans and passages. The Rasch data was also used (in conjunetlon with 

Wright and Mead) to esqolore the unidimenaionality of the characteristic 

S 

measured by the multiple-choice cloze# It was es^ected that the results 
of these analyses would be used as a basis for m^ing any needed adjustments 
in the clo^e fomat prior to starting the major effort on calibration and 
valid at Ion* 

Because plana for validating the multiple-choice clo^e materials Included 
conourrent use of modified wh-items (see procedure for producing xdi-itras, 
Appendix A), the survey design included a substantial number (appro^Jmataly 
1|000) of ^-itemSft (Notei The test in the multiple-choice cloze fomat 
was called Gloie EKercises, and the test in the wh-itra format was aalled 
Literal COTprehensionyDetails. ) The design incoi^orated the largest number 
of test foma and itema or passages that could be ainlnistered to the survey 
sample and yield stable data on each ItOT or pasaage. 

The test administration aample Is defined in Table 5. 4. For each 

grade from grade one through nine between 500 and 75Q students were tested* 

Each test waa designed to be adminiatered in a 40-^inute class period^ 

Since class period time restrictions were f leKible in grades 1 th^^gfc 8^ 

students in these grades had mple time to complete the teati. Glais**period 

schedules were not flexible in grade 9| thus, as Chapter VIII will note, 

aome ninth- grade students did not have sufficient tme to complete the 

Close Exerciseap Each student had some advance training In the clo^e 

format. This ranged from several 20-calnute training sessions In the promaty 
# 

^Dr# Btwen Kidder^ a BSCR staff member, has studied the Rasch model 
data resultiiig from this test atalnistration of the multiples-choice close 
with Dr. Benjmln Wright and his graduate assistant, Ronald Mead, at the 
University of Chicago© 
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Table j.4 



Mkt of Studenti Tested psr Foaip Clozs Exercises 
and Literal Gompriheiisionj Details Teat 



I 

u 



Olozg. Exafcises 



Literal CQmpgihenBlon, Details Test 



Leva. 


I 1 


Level 


5,264)' 
11 


Level 


III 


Lev 


el 1 


Lev 


5,1??)' 
el 11 


Live 


ll III 


Form 


1 


jOCTl 


N 


Fom 


N 


Forni 


|, 


|onii 


1 


Fom 


























I 


I 


128 


13 


14? 


25 


167 


37 


127 


49 


147 


61 


163 


2 


126 


14 


152 


26 


164 


38 


124 


30 


^ 152 


62 
63 


162 


3 




iw 




I)! 


160 


39 


126 


51 


148, 
152 


164 


4 


124 


16 


152 


28 


161 


40 


121 


52 


64 
63 


161 


5 


126 


17 


146 


29 


158 


41 


119 


53 


145 


165 


6 


126 


18 


151 


30 


165 


42 


122 


54 


144 


66 


166 


7 


127 
127 


19 


152 


31 


158 


43 


124 


55 


145 


67 
68 


154 


8 


20 


i4a 


32 


163 


44 


124 


56 


149 


163 


9 


129 


21 


132 


33 


166 


45 


131 


57 


147 


69 


164 
156 


10 


127 


22 


152 


34 


159 


46 


123 


58 


148 
13? 


70 


U 


120 


23 


148 


35 


163 


47 


121 


59 


71 


163 


12 

All 1 


123 


24 


149 , 


36 


165 


^ 


121 


60 


145 


72 


154 


i313 


All 1 


,,802 


All ■ 


1,949 


All 


1,483 


All 


1,779 . 


All 


1,935 



Mi. Total 5,722, Distribution by gradei 1, 551) 2, 336; 3, 524| 4, 5B0| 5, 649| 6, 692; 
7, 751} 8, 709) 9, 730. » ^ i , . , , 



% - 

Not all itudents ware preiint for all testsi 
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grades to a 15-mlnute sesilon in grade 4 and above* 
STOplin^ Deilgn for the Cloze EKerclses 

The Cloie Exercises were composed of three levels of parallel test 
formal Level I, for grades 1-3| Level 11, for grades 4*6| and Level HI, 
for grades 7-9. The first problem in designing the survey was determining 
how many readability levels to Include in each testing level so that the 
test would encompass the lowest and highest achievement levels of moet of 
the Intended population. Table 5.5 shows how this problem was resolved. 
The first testing levels Level was assigned the first 10 of the 28 read- 
ability levels covered by the cloze passages in the TDN- Passages in this 
range begin with simple 25-word excerpts from grade 1 basal readers and 
extend to 70-wprd passages, sOTpled from grade 5^ and 6 readers* Level 
II was assigned readability levels 5 through 16| passages in this range 
coma from materials for grades 3 through 9* Level III was assigned read- 
ability levels 11 through 22, with passages coming from materials for grade 
5 through 10. (Notei All passages at a given readability level were con- 
sidered equivalent regardless of the grade levels of their sources.) 

With these readability ranges established for each tasting level ^ 
passages were then sampled from consecutive pairs of readability levels, 
with the exception that readability levels 1 and 2 were each considered 
separate sampling units. The sampling design for all three testing levels 
is shown in Table 5#5* A set of 12 parallel test forms was assembled for 
each level (i.e.. Levels I, 11, and III) by randomly sampling passages 
repeatedly and without repl accent from each sampling unit assigned to that 
testing level. Table 5.5 shows the number of passages drawn for any test 
form from a given sampling unit, i#e#5 readability level or pair of read- 
ability levels. Thus, a test form for Level I sampled passages from 
readability levels 1, 2, 3 and 4, 5 and 6^ 7 and 8, and 9 and 10. 
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Table 5.5 



Sampling Design for Survey Test in tha faxcu*! Domain- 
Readlng/Llteratura for Gloze ExsKoaLses 



Sampling unlti 

Readability - . Pf^ 



level 



S 



passage Pool by Grade Level o£ Passages 

pec 
unit 



8 9 10 



2i Level, I • 



4 ! i26 



1 

I 



I ^'|ir"r-"'~~"'i~ 



6 j ^ il2 4 Level II . | 

7 . * 6 13 7 I 
I I 9 4 I 

9 I I 11 11 6 

10 I ! 4 6 4 




t 



u j" I 6 11 a 

12 - I 1 -jr 3 Lever 

13 I I 6 11 2 



I 



14 I 13 I 

15 i I S 10 4 




16 I ^ I ^ 4 8 , ^ 

17 ' j * 6 ' 5 5 

18 I 3 9 7 

19 . I 4 4 

20 I 4 8 

22 I 7 
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In the test actainistration* the test foms were systemtically distri-. 
buted to obtain equal numbers of reipondenti on each test form by claisroom. 
This was done by paekaglng the foms for each clasaroom In numerical sequence^ 
repeating the order until the required number of tests had been packaged 
for a classroomi and itarting the sequence for the next classroom where 
the previous one had left off. 

Sampling Daiign for Literal CQmprehension^ Details Test 

The Literal Comprehension* Details test used in the survey was assembled 
using the design shown previously In Table 5*5 for the selection and aislgn-i 
ment of passages to test forms* The one variation was at Level 111 where, 
since available materials only encompaised readability levels 1-20^ read- 
ability levels 11 and 12 were regarded as separate sampling units. 

Once the passages had been identified^ five of the Tmiltlple-* choice 
wh-i terns accompanying them were selected by a process which ensured equal 
representation of item types across test forms and readability levels* 
Equal representation was not possible at readability levels 1 and 2, where 
there were few adverbial items (i#e»s how, when, where)* This procedure 
yielded three sets of twelve 30- item test forms. Level I, Level 11^ and 
Level 111, Except for the modification noted at testing level III, a 
Literal Comprehension^ Details Test form in a given testing level included 
the same number of passages and range of readability levels as a test form 
at the same testing level in the Cloze EKerclses- 

To verify the high level of passage dependency expected with the wh- 
format, a "Part 1I"J aactlon was added to the te_st» Each Part II section 
consisted of 12 wh-items without their related passageSs two for each 
readability level on Part I of the test* A table of random numbers was 
used to assign Items to test forms, with the conditions that the items on 
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Part II of any glveti test form should be unrelated to passages on Part 1 
and that each wh-item type should be represented at least once on a test, 
but no more than to?ice» 
Other Data In the Survey Design 

Once the resultant data t^m the Qlo^e EKerclSes and the Literal GOTipre- 
hension, Details test were colLacted, arrangements ware itiade to obtain the 
response data for the same students on selected subtests of the California 
Achiavemant Test (CAT) given in grades 1-8 in Kay, 1975, This data set 
included all of the item responses for the reading and language arts sub« 
tests and an IQ score derived from sections of the achievement battery* 
These data parinitted expansion of the research perspective to include 
selected validity studies involving the Gloze Exercises and the Literal 
Gotnprehensions Details test* 

Cpjffij^ston 

Previous chapters have presented a rationale for the multiple- choice 
cloze format as a measure of literal comprehension and have further described 
how this fomat was applied in the development of a testing system^ referred 
to as the TDN, which is ultlTOately Intended to supersede conventional com- 
prehension testing systems with the more fleKible and potentially more use- 
ful approach implied in the domain^^ref erenced testtng model* An operationalj 
domain-referenced tasting model is not based on a set of infleKible, fiKed 
tests* Rather, it has the facility to generate a test for virtually any 
evaluation purpose by the working through of an ftlgoritlm which can produce 
any number of test items as needed to survey a donmin. Sampling procedures 
are then applied to these item doTnains in an evaluation design that attempts 
to eliminate, as far as possible, various sources of bias# Particularly 
evident in the domain- referenced model is an improved potential for eliminatln| 
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much o£ the content or Item bias that has been noted as a serious' problem 
in standardized tests of reading comprehension. 

Unfortunately for the developinent of this project^ there are few 
Operational Ttwdels that can serve as detailed guidelines for the design 
of doinain-referenced tests, particularly for the developnent of testing 
systems that are referenced to very large stimulus and response domains^ 
such as the virtually infinite field.pf comprehension as applied to the 
domain of written discourse* This project Is an attempt to apply the 
domain-referenced testing model to very limited but still very large segments 
of that domain. In the course of achieving this goals the theoretical 
eletnents of the doinain-ref erenced model were carefully followed* Insofar 
as possible at this times the psychological meaning of the response required 
by the multtple-Ghoica cloze item fonnat has been elaborated* This item 
format has been brought to a highly objective state where it now seems to 
have the essential objective, generative characteristic required by the 
domain-referenced model. Further devolopments along these lines seem to 
indicate the desirability of prograOTtu.ng both the Item generation and test 
assembly procedures in a single Integrated system for the purpose of ex- 
tending the domains included in the test. 

The new cloge item formt has been applied to a variety of content 
domains relevant to the school setting to generate more than 13,000 test 
Items. A sufficiently large number of Items is now available to assemble 
domain •referenced tests of literal comprehension for a variety of evaluation 
purposes* The present chapter explored a number of applications of the 
testing materials In three widely-used forms of testing or evaluation* 

^Actually, the domain-referenced model seems to mply little about 
meeting the conditions for construct validity* Messlck (1975) Indicates that 
this is no less a requlrOTent for domain^ref erenced tests than for many other 
tests of ability and achievOTient used in educatlonp 
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PresfiTitly the project has turned toward applying the clomB testing 
inaterials in a survey design in an urban district, ttie results of this 
application showad that, even at this stage where the testing materials 
are yet in a rudimentary "paper- based'* state, large numbers of test forms 
could be quickly aisembled to mount a multi-matrlK sampling design in a 
large population of readers with a broad range of reading levels. The 
data resulting from this test adminlatration are currently under analysis 
both to permit refinement of the item format and to explore its validity* 
As chap tar i VXII and EC will diow^ the preliminary data ssot . to Indicate 
that the Item fowat does not yet require any major tnodlfication. In 
addition^ there seems to be substantial indication that the overall test 
fits the theoretical and practical model constructed for it. Although these 
results can only be taken as preliminary or tentative^, the promise 
offered by the teitlng model appears to have been justified to the extent 
that additional or sKpanded work on the testing materials is warranted* 
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CHAPTER Vi 

TEST vH^WMion mwmMEm 

starting with an analyils of meaiurKnent needs in the area of reading, 
this report has proceeded by stages to desQrlbe the development, pu^osep 
and charaeteriitics of a new approach to assessing a baslo level of reading 
comprehension— literal cOTprehenslon. The current state of this aasesanent 
^stOT %Aiioh| as notedp consists of some 1,374 clo^a pasiagesj associated 
items, and other testing materialsp provides a broad foundation for the 
study of its validity* The approach planned for validating and refining 
the olos^e fonoaat la a series of concurrent efforts designed both to study 
the meaning of the test and to bring the testing materials to a broadly . 
usable state* 

This chapter first provides an overview of the activities underway and 
planned for research and develo^ent on the clo^e segment of the TDN. This 
plan includes the calibration of the test passages, studies of the validity 
of the cloi;e aKercises as a test of achievanent and ability, and research 
designed to make the test broadly usable in practice* The focus of the 
discussion from there is on the latter two topics^ with raoit attention given 
to the topic of test validity* The problem of calibrating the test passages 
is treated at length in the next chapter/^ 

Research and DeveloTOent O^rview 

The overview of the research and development plan for the close 
e^cercises is shown in Figure 6*1* EKaminatlon of this figure ^ows that 
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this research is embedded in a bTOadei Issue referred to as productivity 
teseg^cha The cLo^e and other s€paents of the TDN a^e being de*veloped in 
part because of the need for top roved niaasures of school output In studies 
of ptoductivlty in reading instwatloiis Howaverp slticc the concern o£ this 
docunietit is with the testling materials, the forthcoming diicusslon deals 
only witli the lines of raseaMh activity projected for tha TDN la Figure 

rhe first line of this research on tha cloae exereisei refers to the 
problOTi of calibrating the test passages on a slngLep e^^al -Interval scale* 
As ahp™ in Figure 6«1| preLinlnary calibration studies of the test passages 
have been underi;ay for some ttae* Using the May^June test ataiiiiietration 
as a data sourae^ a new coroputer progrffln (TJrl^t and Mead, 1975 D vas applied 
to decemiine the applicability oJ the Rasch model* The laaoh model appeared 
to ac^urateLy define the t^ait uiide dying the cloze eKerciseSp and^ as a 
results the additional stages of the calibration reseaTCh ihowi in Figure 6*1 
^cra justified* As described In Chapter these additional stages pro*- 

J eCt II further period of es^erlmcntation with the modelji followed by a 
gamral appLicatlon to the total pool of cloze passages> 

rhe second Line of research activity for the TDN, wiiioh is concerned 
^ith dlfferant typas of test validity^ ^as begun as a series of preltolna^ 
valtdlty studies, based also on the May-June test ainlm St ratio tu This 
Initial effort, ^Aich contiriues to date^ is expected to provide a basis for 
planning a second stage of test validation. Identified In Figure 6#1 as 
■'Short^Tem Validity Studies.^* In these itudies^ a nmbcr of critical issues 
ralating to altarnate InteJ^retatioriB of cloze test scoreS| not all of which 
are neoassarily identified to date, will be investigated with anall student 
samples* Subsequantly, the testing materials are again to be adjusted or 
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the underlying aoncept realigned before rnountlng a larga-scala study of the 
test usir^ •ylrtuaLly the total popiilatlom of an ixrbaii school disti^lct of 
13,000 students* This largar study will aomblne the features of ^ort-tetm^ 
longitudlnali and ceo ss- sectional studies in an' Inteiislve effort to furthar 
clarify the p aycho linguistic meaning of close test scores acMis the grade 
1-12 student poptilatioru 

Concwirretit %rlth both the callbratioTi and validatioTi rasearch^ an 
effort ^111 be initiated to progran Che curcent Item and test generation 
procedures Into a generally ej^ortalle rQUtlne that x^lll he usable in various 
settings, such as state education depactaients^ city dtstrlotSi and regional 
institutions that coordinate coniplex technical edncattoTial services* In 
additions workable models and guidelines will be drawi for using the test 
asBOTbly procedure for supporting a vaifflety of av^aluatlon purpoSaa, ranging 
from oomplex evaluation studies of reading progrms to dtagnQSlng the level 
of reading materials that an Individual can compraheTid literaHy« 

Franewrk for Studies of Test Yalidity 

As noted p^evlouslyi work on test validity has been organlgad Into 
the three broad lines of activity identified %n Figure 6«1# All of these 
activities have to do with either astabll^lng a basis for test validity 
or Investtgatltig the validity off th^ clo^e test fomat. Before describing 
the specific raiearch activities pl^rmed for eKanlning the validity of the 
cloze exercises^ the franework used for deteminling the kinds of validity 
studies demed necessary is made apparent* lable 6.1 suttiniarlses this 
f ramewrk# 

The first type o£ validity referred to in Table 6.1 Is content validity. 
Content validity focnses generally on dOTonstrating how well the test sables 
the classes of situations to ^Ich a test score Is to generalise* Detailed 
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Table 5,1 

Sujinary of RsLevaTit Tfpas of Valldatioa 



Type of Smple 
yalldlty Quastlong of . Immt 



• Smple TypaSi=pf 



Judianents/Diclslotis Made 



Contsnt 
Validity 



liucational 
lapoctance 



ConsCruct 
Validity 



Flicaniint 



How well doBS the test smple 
the universe of responses and 
situations about which con- 
clusions aie to be drawn? 



(1) Eetiace the smpling of con- (1) Judge degrea of biai h 



Dois the test measure an 
faportant editdatlonal out- 
cme2 



tetiti (2) Deteimlns rapm- 
duclMllty of Itan genefatlan. 

(3) Beteciine agreaiiant on 
content fiitflioriiS smpUdi 

(4) Detennlne reproducibility 
of test gimratloni 

(1) Dattonitriti relatianshlp 
of test to educational ohjac- 
tivis. (2) Daaonstrate that 
Che dharacteristic measuiid 
by the test it valued and of 
practical inportance in a 
varlsty of situations, 



contint representatloni 
(2) Identify iubjecttvlty In 
procedures used to genarati 
Itme or tastsi (3) idjuit con- 
tent reprei'entatidn In tha teiti 

(1) Test is coniideied a 
lalavant tneisure of outooiiieSi 

(2) Potential social and 
educational utility Is detec- 
ninedt 



(1) DeteBnlna posiibli lean- 
ings and uses of test sceres 
in practical iituationSi 



T2)~Ki£y the tist t^ Inpiove 
consistency mth constmict 
and intiEpietatlonSi (3) Mcdlfy 
and extend the mganlnp 
mnfoundlng the cestt 



Does the test measure what (1) Studies of convergent and 

it purport e to leasurs? Is diiGrlniinaiiC validity, (2) 

the chari£tarist_lc meas ured Mationsyp of ttsc_s;eore to^^ 

by Iha'trstTna is amount of schooling and shoit- 

influenced by the educational tsm IntemntlonSi (3) Davil- 

procisaf tiat is the mianlng opental studies of conBis- 

and Interpritation of a test tency acwss papulations and 

score? iiat are the educa- domaliiii (4) Studies of response 

tlonal and social consiquences procissas by ap, (5) Studlei 

of using thi tgst? of dlniensionility, 

Is perfomiance Improved when CD iislgn studancs to instwe- (1) Detemine abiliEy to 

students are assigned to tlonal levels and ranges of ganerilise from a test score 

Initructional materials or materials using test ic&rai and to Instructional and othec 

conditions oa the basis of coipaie to unissigned studente, raidtn| contists, 
test score el 



^Mapted f rm Cwflbaflh (1971, p, 446), 



- evidence of cantent validity is of particular relevsnoe tot the cLoze 
testing materials since it Is assiEned that a score on my test assenbled 
from the testing ^stem can be intei^reted directly in t#ttns of a person's 
ability to read in a specific universe of wltten dlij^-oui^ia* Xn establish- 
ing CDntent validity of a domain -^referenced testp the toTOitigator must 
dononstrate that the test accurately sffliiples the doini.i^ls to "which the test 
is intended to generalize* Also of concern here is the adequacy of the 
universe definition and the objectivity or reproducibility of Item ' 
const met lon« 

Content validity Is established largely by TOplafical weanS| for eiCMpie, 
by referring the content ampllng plan to test users# The toportance of 
what is measured by the proposed test is established largely In the 
theoretical statement which defines the construct miderlylTig the test* 
That is, the definition of what the test pti^orts to mMBUtm not only 
establishea the theoretical importance of the testp but also ©floras the social 
and practical impHcations of test use (Messick, 1973)^ Evidence of the 
importance of a test Is thus initially a problmi ojE loglcji coherence^ and 
the adequacy of the construct definition^ but It is mlw utXtlniately dete mined 
by OTplrical results vriilch reflect negatively or poaifclwly on the- network 
of concepts defining the test and Its uses* The importeanoe of the cloze 
ae^ent of the TDN seans to be adequately establidied in the construct 
definitiont but still to b© shown is evidence that th^ t^$t accounts for 
something educationally and psychologically raeanlngfulMi 

The thlrf type of validity relevant to planning ftha sourse of research 
on the cloae format Is construct validity. Because th^ qloze e^cerclses 
claim to measure a particular type of comprehension TOd bMause this claiin 
has a number of very important taplications from thmmutaul^ decision- 
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roaking, social^, and policy points of viiw^ it is critical that the 

psycholingiiiitic and practical toplicatlons of scores from the test be 

explored «itablished (Cronbach^ 1971} Messick, 1975). In educational 

measurOTent.^ thmm has been a tendency to regard an achlewnent test as 

terminally mUi if its importame (measures recognised objectives) and 

content valtditey are establishedp Dr^, a new test of achiewment may be 

validated agatast several established tests^ whose validity ultimately 

also rests on oider clatos of toportana^ and content relevance. Howaveri 

Messick (1975^ P* 956) has pointed out thati 

9 ^ ^ even for purposei of applied decisionmaking, relianca 
u^pon criterion validity qt coiitent coverage is not enough, 
9 ^ M the tneaning of the maaiure must also be pondered in 
oird^^c to evaluate responsibly the possible coniequences 
ot the proposed use. 

For close segment^ this view requires a series of interrelated 
studiasj sottt© Involving investigations of convergent and dtscriininant 
validity^ aotttt relating to the oonslatency of the test across populations 
and sltuatlon^:^ and still otheri involving exmination of the process of 
responding to the test or the effects o£ instructional interventions«-to 
nanie a few* E:^pnining the construct validity of the clogs test, it will 
be seen, oowiumes the larger part of the presant validation effort. 

The la^t type of validity identified in Table 6*1 refers fco the utility 
and accuraay ofi the test in in st actional decision making. The clo^e tasting 
materials are intended to provide a bails for a variety of educational 
placementMtypg decisions, such as detamlning the degree of fit of texts in 
a given content area with the cotnprehension abilities of groups of students 
or assignini t^Kts in a particular range of readability to an Individual 
student* 
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studies of Test Validity 

Continuliig or: planned studies of the validity of the cloze testing 
materials are reported below In the organiaatlon presented previously in 
Figure 6«1# The types of validity relevant to establi^lng the cloge 
fomat defined in Table 6,1 are reflected throu^out this report, with the 
exception o£ the factor of taportance, which, it Is felt, was substantially 
established in th& theoretical discussion of literal Gomprehenslon in 
chapters 11^ IXXp and IV» 
Praltoinary Validity Studies 

Two avenues of Investigation of test vmliC y were initiated in the 
prelimina^ phases of research on the cloze test. The major part of the 
preliminary effort is based on the May-June test administration and has 
largely to do with refining the test and tentatively esqploring its construct 
validity. The second avenue of the research constitutes the beginnings of 
a series of content validity studies* Each o£ these investigations is 
discussed in tum« 

Co n s t rue t :v al id It V e The organiaatlon of the data collection for this 
component of the preLlmlnary phase of the validation effort was presented 
in the latter part of Chapter To recapitulate briefly^ this study 
consisted largely of gathering data from the aininlstration of three types 
of tests in a grade 1«9 population of 5,000 studentsi (a) a raultiple--choice 
cloae testi (b) a Tnultiple--choice comprehension test composed of modified 
wh'Nit^si and (c) a standardized achievement test given annually by the 
study district* The cloze and vrii^tests were initially conceived as two 
different measures of the sane construct of literal cum^prehenslon, with the 
latter test having been constructed because an adequate, alternate meaiure 
of the construct was not available* The standardized test used in the district. 
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whioh was the Gallfornla AohleVCTeTit Test or CAT (Fom A)^ contained 
measure i which potentially converged or diverged with the concept of 
literal Gomprehenslon meesurid by the cloae test. A list of the variables 
defitied by these tests is givisn in Table 6. 2* 

Table 6.2 

List of Vartabi^s Measured by Tests Included in 
the Prelfcninary Validity Study 



Clo^e Test 

Total Glo^e Score 
Clo^e Paragraph Scores 
Nouti Score 
Verb Score 
Adverb Score 
Adjective Score 



ffli^Test 

Total Ifli^score 
Passage Independence 

How ACQ re 

Wiat aaore (noun^ 

p TO noun) 
Wiat score (veri) 
When score 
MKeiTiS score 
MilaJh SCO ra 
score 
score 



Calif ornia ^ 
^hi_eyanent Test 

Vocabulary 

Lettar recognitioTi 

Word forms 

Word recognition 

PlctUre-wrd association 

Words in conteKt 

General COTprehenslon 

Lqcate _£itfttjs _ _ „ . _ _ . _. . 

Interpretation 

Relationships 
General Igat Ions 

Draw inferences 
OOTprehenslon/ Social Studies 
COTprehension/ Science 
GOTip rehens Ion/Math Qnat Ic s 
Language Skills 

Sentence structure 

T ran sf o mat io n s 

Mechanic s 

Usage 
Verbal IQ 
Non-'Verbal IQ 



a 

Not all subscoras listed are available for each test populatlon# 
This Initial test admixii^tratlon had several purposes* The first 
objeofclve was to combine logie-al analysis of the consistency of application 
of the multiple-choice close item fom with conventional and Rasch Icot 
analysis data with the intent of conducting a first ref Indent of the total 
ttmi pQpl# As ei^ectedi this activity led to a number of changes in the 
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rule ^:stem for selecting and prQcessi-ng a passage in the multiple«choice 
close foicinatf, This activity further resulted in a major revision of the 
item pool vAilcli ulttoately affected an estimated 85% of the 1,374 passages 
already on papers-punch tape. 

A iiCiond major pu^ose involved use of the Rasch ±tma statistics in 
deteminixig the adequacy of a unldiinensional model in accounting for the 
hypothetical underlying trait of literal comprehension across so many 
differ#T)t clo^e test forms ( N — 36) and populations* In this activity^ the 
distrlbuttons of Item difficulty (more appropriately called item easiness 
in the Rasch model output) and ability were also e^canined in detail to 
deteOTiTi© the eKtent to ^ich the testing systen was consistent with the 
domaln^refarenced model. 

A third tnajor purpose involved eKOTlnlng the internal consistency of 
the clo:^e tait and the %^-ltan format through conventional Itan analysis 
technique i# Xn this effort ^ the various part scores of each test type were 
intercorwlated and the Kuderp- Richard son Fomula-20 reliability coefficient 
was calculated on the total test; score on each test fom ( N = 72 test foms). 
In addition^ for the cloze exercise the correlations of nouup verb, 
adjective^ and adverb subscores with total test scores were calculated, 
correctiid tox the correlation of each part score with itself in the total 
test scores ( N ™ 36 test foms)*^ Together, thege analyses reflected the 
conslstiancy and uniqueness of the four types of deletions made in the cloze 
item £oimm 

Pimily, in an attempt to exOTiine convergent and discriminant validity, 
the varlQUS subscores of all three typ^s of test were intercorrelated for 
each CAT test level population (Level I, II, 111, ar,d IV). These analyses were 
designed to yield a set of validity coefficients which could be examined 
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for consistency with theory* This analysis ran^'ns very Centafcim at this 
point due to the difficulties involved in ace ately e^resslng the psy** 
chollnguistlc Tneaning of ^at Is Tneasured by the various it eras iTmluded in 
the CM# Becausa o£ argumants raised in Chapter 11, strong cottftdence 
cannot be placed in the tests based on the wh«itOT fomat elth©^^ particularly 
as used here, where the test was given across such a wide age ^ifttige. 

Currently^ the intercorrelatlons of the subscores In the fchwe Cy^es 
of tests available for this phase of the investigation are beittg subjected 
to a principal components analysis and verlmax rotation for eaoh level of 
the smple* The results of this effcrt will be mads available a later 
point in time* 

Contant valid at ioT^b Work in this area was begun in the faXI of 1975 
with the selection of a regional, representative sanple of 192 #QhQol 
dlstrlcCs* The cooperation of the individual school distriatg %n this 
smple has now been secured^ and in early 1976, each distriot will recaivs 
first a lengthy questionnaire and Inst met ions* This questioTOi^iiKre will 
include a set of labels defining major clusters of the objecti-w^ of Mading 
instruction* In addition, the respondent will receive a set q£ scaled 
passages chosen from the wh-questlon pool to represent readability levels 
1«*20« The respondent is to Indicate the clusters of objectivei te^ught in 
reading instruction in grades their levels of emphasii In tn^ta^ction, 

and the associated ranges of passage difficulty for the reading twt^ri^ls 
usad in instruction with each grade level population* 

In part, the study is intended to define the levels of pai^Mge 
difficulty that students in various grade levels are routinely #^ectfed 
or balieved to inaster in schoolj l«e., are preimed able to coraptthend at 
the literal and hi^er levels of reading comprehension* Th$ sahool sTOple 
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is well def Inedj thus enabling the partitioniiig of findlngi by mitonj 
by urban^ suburban^ and rural distrlcts| and otemcteristics of thm 
student populatloru^ 

This initial effort at content validatiqt^ t^^ected to provrtde^ B 
basis for defining the ranges of passage diffteulty wflected In the. inat^^-v 
tional esqperlances of part of the studant popul^d^lon for t^lch the. ^Ichm 
testing materials are intended. These data will bwWs© part of a mo^Vfe 
expansive effort to define the content domains o£ »^dlng instmctton from 
different perspective s# 

During the spring of 1976p this sa^ne study ^«iple is expecited tQ mspond 
to a second questionnaire which will attanpt to obt^Aln a more reprewnti^tlve 
definition of the content categories that ei^^ei^ fchi^ most frequent 
of written discourse eKparienead by grade 1-^12 ^t^u4^&nts In the schOQl and 
extra-school envlroiroents* Based on the m&ut^^ of this study, tlie ^pntsnt 
of the close component of the TDN will be furCh©!^ adjusted and a mm 
daflnltive effort will be made to describe tha ^m^^^hiltty characWriaCloSs 
of types of written discourse by grade level m its ml at Ion to oth^v 
relevant situations* The size of each majo^' wisr^xm o£ written disooiirse 
Identified as important to a level of the 1*^12 0^uAmnt population wHl h& 
asttoated* For each such stratmnj and using a 95% confidence lev^l^ 
readability samples of 100 word passages will b# te^OT# Sub sequent; Xyr^ the 
readability levels of each sanple will be corapWM analyzed using tha 
Dale-Ghall IndeXj the resultant distributions of w^rtrtllity will be a.wayed| 
and means and standard deviations calculated* 



The study will aim contrast the readability :^anges reported by 
teachers with the readability ranges of Che p^^#ag^$^ included on itvn 
major standardized tests of reading comprehen^A^fkvs 
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The results of the forego in$ analyse s are e^qpecttd to provide an 
adequate basis for adjustln| th^ wntent of the close component of the TDN 
prior to arty serious effort at ^J^lib^ation or more eKfetrisive validatiotw 
In addition, the ittfomatlon m *^'^mctmd and actual rti^dablllty lewis of 
relevatit areas of written diS0OOTi# %rf.ll provide a ba^is for daflnlng a 
nomatlve ootitert for the luCt^Wtiitlon of test scocts resultir^ from the 
aAainistration of the nlo^e Somm^^ The sample distributions of readability 
levels of written discourie tti vwious categorias are mqulred to provide 
a basis for generalizing frowa a score to a domall^i of written discourse. 

Short-^t em Validity Studies 

Short-temi validity scudiasi Mf erred to prevlouilly in Figure 6*1, 
constitute a group of relate4 efloffia designed to addl'#ss some crucial 
Issuei of test score lnte^»teattot^ The results of the^se studies will 
guide the revision of the alQt^m waipclses in p rap aratlTO for further 
calibration studlas and longasr^WWnn more ejqpenslve affforts at test 
validation* Table 6#3 offers m Irwaworft for organi^iwg these studies 
in terns of four groups of varlrtlai in the test sltii«lon that may affect 
the interpretation of a test s^ow based on the existtog doze format* The 
first group of variables refars t^o toportant global chawoteristics of the 
types of written discourse np¥ Jnoluded In the test, variations of ^rtiich 
may cause the test to measure Qth««' than the hypo the ii»d literal compre- 
hansion factor. The test format iS presumed to be an ^afffeative mode for 
measuring literal comprehenilon Is essentially immrlant across written 

material that differs in mm^ oft specific content sanantic complexity 

or ^ntactlc compleKity* Conta^teurt constraint refers a specific problCTi 
recognised in processing pa^S^g^^ into the multlple^^chotce cloze format. 
This Is the Issue of whether fch^ meaning of the test Bmv4 varies as a 
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Table 6,3 

Pr^eTOSfk for Organising Short-Tern Validity Studies 



Ife levant Source s of VariatlQTi in the Test Bleuat ion 
Passage * Response aonte:kt- Person 

Contextual constraint Sanantlc competitton Tmlnlng Personality 
Syntactic cOTiplesclty Syntactic eo^etltlon Instnsctlons Oculomotor skills 
S^antic compleKlty Content wrds Tdirie Oithographic com* 

Content area Idioias^ tnetaphorg ExTOiner petance 

Phonological com- 

petemce 
Seittantlc Qotnpetieiice 
Syntacttc competence 
Specific knowledge 
Speed 

Verbal reasoriling 
Ncn-vexbal i^eason- 

Comprehenslori 
Literal 
Non-Literal 

Test--wiaemss 



Passage factors are rapresented by a multitude of varji bias. For ' 
axaraplep the semantic componatit Includes VQcabulary loadj metaphorical iisage^ 
level of abstraction, etc. 

function of the relativa famedlacy of the passage coxiteKt that in turn 
determines the response to a given deletiom in a pas saga* This content may 
be Intra*- sentential, lntersententlal| or extra- ientemtiaL^ depending on 
how the rules for deletion are applied* 

The preliminary validity studies have to date pTOvWed a holistic 
evalyatjon of the CQmbinad effects of content area^ syntea^p and sffinantic 
facr.ors en Kh& intei^psetiation of cloge test scores^ since all of thase 
factor., are iodirectly if someiAat lnefflcleM:ly measured by an>pllcatlon 
of the Dalfe-Ghall indyx to the test passagej?^ Additional itudtas %^hare 
these factors are eKonlned in isolation In Lermi of th&ir effacts an test 
score Inta^retatiovi are planned for nhe ^rliig and stfraoer of 1976* Zt is 
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intended that the issue of thm effects of contes^tual constralrit oti Qlomm 

test scores vLlt also be itudled at that ttoe» 

The general approach to the studies of passage ef facts vtU to 

vary passages on Che paifticular ddiaension of interesfcp such as ^mtaetlG 

COTiple:^ityp ^dtlli holding otler dimensloTis constant. The conteKtval 

conditlgns wtthxn pasiages tliat appear to prtaiailly detemlne the respOTise 

process ^ill also be categorl^fid separately. Correlationffll analyils may 

than be used to analyga the contributions of selected person factoffs 

(coLiiaaiTi 4 in Table 6»3) to Item and passage responses as a basis f^r 

Inf mriing..potentiaL changes in the meaning, of a. cLa2a tes=t.. scam = 

by om or raoTe passage varlabLeSi This approacli to analysis should prwlde 

inportant leads ccncemlng the passage v^arlatlons that Interact ^ith clo^e 

deletion rules and that may causa the test to unduly emphasise genaral 

reasoning or atheae factoics as opposed to those factors which are pMSiimed 

to coritrlbute to a literal understandirig of the meaning of a given aample 

2 

of written dlscouarse in a leirel of the student populatlom 

A iecond group of studies ds planned for spring and SOToner,' 1?76, to 
address uniceao^lved problems suriotitidlng the preparation of dlstractors for 
a cLo^e passage (lable 6*3^ column 2)* ^cording to the theory stirrouiiding 
the test^ sCTiantlo competition anong dlstractors vill unduly anphasise 
knowledge and reasoning skills ^hlch axe essentdairy eKtraneoiis to the 
literal compmhanslon of the passage and ^Alch may be synonymous i^lth other 
i3iore compleK laeasures of coiaprehension ox IntelJlgancap This Interptatation 
naads to be evaluated In different strata of the population -with b^th 



Wiat lal^t be es^ectad Is that syntactic or sanantlc competence are 
ffliphaslged at va^lng levels of compleKity by ti passage or partlctlLar 
deletion depending on Its organlzationip conterit^ and stmctural cOTple^tty, 



EKLC 



6-13 

165 



SOTantlealljr Interfering and rKsn- interfering distraGtors applied to the 

SOTS passagae. Similarly^ the adeqtiacy of using grfflimatlcal claES as a 

baBls foT salectlng distraotocs, and thus presitfnably coEtrolling for 

Syntactical competence alone bel»ig the basis for salectiTig the correct 

answ^if Din the alozm testf needs fto ba i^cmined before th^ computerised 

di St factor generator can be reascnably final Iseda 

k flnaL issue of this type, identifiad last in coltsnn 2 of Table 6* 3^ 

'twill be e^OTtined In a brief stud^? d& signed to detemine t:hLC possiule need 

for nic/difying thm distraator generation process when the wed deleted from 

a paas^ge is part of an idionii 05 a Tnet^phor^ or Is a specific sutject 

matter word* Fresentlyj the rules governlnf^ the geiitra^.lon of distraGtors 

for these types of deletions are simple and ,??:ffaightf^raard* The ^^ rioan 

Harits Se Word Frequency Book (CairoLl et al* ^ 1971) Is u sad to Identify 

specific content words in a passage and mecaphrirical and idiomatic language 

are largely ignored as a basis for modifying the mles f lir dlstractor 
3 

gene Vat xon* 

The third group of shart*team studies that is requirad before the clo^e 
forttiat can be effectively modified for large-scale validation and calibration 
Is Identified In the third colunti of Table 6*3 as the testing context. The 
mount of training needed to standardize the test across a widely ranging 
student population is an taportaM research Issue beGause the test uses a 
TieW| unfanilLiar f ormat^ The related matteri of test InstCTCtions and 
e%afttlner bahavior will aLso be e^cmined in applied reiearch which will 
attfflnpt to datemine the condttloni under which test«talclng motivation may 

under sane conditions, a word that foms part of an idiom may not be 
a candidate for deletion (See the rules for generation of dlstractors In 
Appendix A)» 
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be TOAKlffiized ^lle minimising the effecti of gussilng on the Cesc score. 
Finally^ this cLeniffmc of the xeseaich effort ^111 attempt to define optimal 
amounts of time for test-takings The Intent of Chls area of the research 
will be to prodace an eKaniner and escmtnee trainiiig package that will 
orient the teat-taking situation to one that is more psyohologically attuned 
to the taking of a domaitwref erenoed test* 

In GOMludlng this brief praientatlon of the ^ort-tem phase of test 
validatioTi, it seons neaesoa^ to point out tiat the foregoing set of 
planned studies represents only some of the larger IsBuee of wlldlty 
raeogniged to date in work on the eloge n praatiaa^ .. . .. . 

the preeent spproaeh to iTrvestigatlng the affects of a partiGalar variabla 
on what the Glo2*e fomat measures 4s likely to he relatively holistio* 
For eKanplfif Iti actiialLy me a luring semaiitiG comp Lenity, it may be neeessary 
to focys on trailtionat approaclies based Largely on voaabulary load* SCTantic 
con^leKlty has a potentially large nunber of rBferents and d taetislons and^ 
as an area of iaveitigatlcn relating to the cloM prooass^ eotild easily 
constroe tha w^hoLe of the resources devoted to this research, ttierever 
possible j tha tandency will be to use an erfstlng roaasure* The eKcaptions 
to this appTQach will ba in the work described in the nesct seat ion which will 
attempt to create an alternate and less anablguotis criterlom for measuring 
literal compr^henslori as well as snount a paralleL effort to ci^eata measures ^ 
of syntactic competence and ^ntacttc Goinptexity* 
A LonRltudinal» Cross^Sectlonal V^alidation 

A onei-year study sorablTilng the elements of ^ort-temp longitudinal, 
and Gross-sectional designs "will constitute the basis for a broader 
exarninatlon of tha constmot validity of the close testl^ material s. 
This phase of the res^iarch vill attOTpt to address aoine larger issues 
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CDiiieerning the validlt:y of the multiple -choice alamm fomat, ^ile contlroutng 
to provide a context for examining reiearch issues previously identiflsd. 
That is, detailed analysis of it^s, passages, and item format varlattoni 
vill continue here and even be extended to additional content areast Howey^et, 
the focui of this phase of the rasearch will hm on the devalopmental and 
instructional mplicatlons of the constmict of literal eomprehensioru 
Sample questions of interest will bei To \Aat es^tent do literal eMipre- 
henslon scores chatips^ in relation to the passage of instructional time 
over 12 gradesf the development of literal comprehension^ ag measured 

by the t e s 1 5 i3 n t inuou s p ye r the icho p 1 ye ar s ? _ 0 r Agm s mq re.„ d e velo pment . . 

oectir In some yeais than in others? Wilch students develop the skill most 
rapidly? To what extent is tha develofment of literal comprehension 
Influenced by manlpulable homB at*4 ?ohool factors^ By non^cianipulable honie 
and school factors? ilmd finally^ to ^at extent does the development of 
literal comprehension affect other school learning tasks? Does literal 
COTprehension contribute to academic and personality development in the 
school? 

Study sampta »\ The proposed study is to take place in the smm urtan 
district that provided the data for the prellmlna^ phase of the research^ 
eKcept ttat It Is es^acced that virtually the total student body will 
participate in the longitudinal studye This will provide a heterogeneous 
aaaple vfith a si^e of mpre than 1,000 students per grade level at each of 
grade levels 1^12. The idea of conducting the study in this single district 
contributes substantially to the economy and feailbllity of the rasearchi 
#ill% It is felt^ not greatly affecting the ability to generalise resulCs* 
The composition of the student body Is fairly representative of the major 
cultural and economic strata of the New York State population, escoept that 
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minority elaaants tend to be sone^at overrepresented. The dlstrlGt hai an 
excellent standardised testing prograttj keeps accurate and ccDmplete reooi^s 
on the student population, and Is ve^ supportive of reseaKhp There ii 
considerable positive coinnunlcation and Interaction bet^en the district und 
the larger cortsnunityp thus creating the aondltions that ^11 be needed TO 
obtain the parent Interview and qftiestionnaire data to be collected in th# 
study* Finally^ because of substantial fecial Federal and State monlei 
annually infused into the diitrlGt, there is extensive variation both between 

and within schools In the amouiit of tiine and resources deTOted to readiiig 

I 

instwictianA^^Thls variation. ^1^ 

the relative effects of school factors on eloge test scores as contrasted 
with their effects on other tneasures of reading. 

Gene ral study designg The categories of measures to be collected i^ 
the study are llBtad in Table 6«4n This is esientially an escpanslon of 
Table 6#3 to include ineaau res of iohoal and horoe facto rs# Selected 
personality and achievement test factors have also bean added to the --peiT^on" 
columm In large part^^ adequate measures of these factors are either 
available from the study district's testing prograft or from previous 
research of this type (cf* Kidder at al#, 1975)» The new instmmentatiott 
that will be constructed for tha study Includes a measure of syntactic 
complexity applicable to reading matarialSp a measure of syntactic comp^t^ncei 
applicable to the student population^ alternative measures of literal 
comprehension, and questionnaire and interview schedules that will be um^ 
to dec e mine the breadth and coniplexlty of the reading es^erle rices of the 
student population after the manner described by Ghoinsky (1972)p The 
selection and organization of v'arlablai and measures, as showu in Table (^m^p 
provides a basis for, defining the ^ntactlc and sonantic comporiants and 
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Tabli U 



fmm^ im OEganiiiiJi Vifiibli Miaaipi it tk Ofls-lfiar Validition Study 



— 1^ ^1 im = 

Syntaetle coinplerit/ Gontanfe md^ firionallty Con^iexit)? raiding iociosaowic Stitui 

sUtic^edt/ IdtaMataphori toiity wtirlili _ Cultural bi^eund 

Content araa School sitiifiGfclon InitructioMl tm . PaEeiit.>chlld reading 

; Self iiteejn _ Initwctional lods esfesliMiS 

QrtbiEaphieiiWpstenei^ LeiRiiiii ewiromint Ghild's riadini 

Ihgnoiogieil Cdpfcipe Tiiehsr aga ixpiliiticis 

SyfttMtlc fioinptliWi riichir ixpsrienee 

ford knowlidl'i 

$gicMlc..knp^«4||,.,^....,^.^ _ ^ 

' " Mathanatici iqllWint 

Scisnee achilTOnt 
I Seciil Stuilsi iehlsment 



0 



Languaga Aril |cliiev«Eiiit 
Vetfbal feasQfiiD| 
Non-virbal rswning 
OOTpEihingioti 
Liceral" ^ 
Non-litewl fatttors 



Agi CniQSs) 
Giade lavgl 



\vallabla from tha diitrtet itandarfiied afihievi«$»t ^^ithg pttgrams, 

b . ' 

To bi cgnitrueted for the study, 



levels o£ the maj©i" depandent variable of interest (literal eompreheniion) 

i^hile attmptitif to trace the GOntributions of iMnediate and contemporary 

antecedents of treading perfomance in the hoiae and school* 

The prooedurea for aiisembllng the primary measures of literal compre* 

hension to be used in the study will initially Involve the selection of some 

600 eloze passages imm the TDN distributed equally across the total range 

of readability levels and In the proportions of 2/6 for the reading/literature 

4 

category and 1/6 for each of the additional textual categories* These 
passages will provide the raw material for the calibration pilot described 
in Chapter Vllji - with . Che. re suit.. 
on a coinmonf equal-intarv^al scale* 

Subsequently^ the scale for the reading/literature stratmn of 200 
passages will divided into 30 equal interval i© Each successive set of 
6 such intervmla wll constitute a test level, with no overlap between levels* 
Stratified random sampling will then be applied to the passages in the 
intervals in each test level to obtain 6 parallel test forms of 6 passages 
each per test leval and a total of 30 fonns acTOSS test levels* The nijffliber 
of passages requiTOd for this design is ISO (36 passages K 5 test levels)^ 

The foregoing design will be repeated for each additional textual area, 
eKcept that the design paraneters are 100 passages from ^ich 3 passages 
will be sanpled f^am each Interval in a test level. This design will 
require 90 pasiagaS' and will result in 3 parallel test foms of 6 pasiages 
per test level* 

A special placonent test will then be constricted from the reading/ 
literature passages for the panose of estimating the test levels of 

4 ^ ^ 

It will be mecessary to add soma new passages at the upper ends of 

the scale for the 11th and 12th gradasp 
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rimdivldual studenti# Thereafter, the student population will be assigned 
to t«sc fonns by test leweli within grade levels using random stratified 
Sapling to Insure equal niamfaers of students across test fomst This 
process will be repeated fot the test Corns in each additional textual 
ar©a^ using each time a dlfiferent randomly constituted one-fourth of the 
student population at eaah gyade*. Students in each grade level population 
will thus receive t%m parallel doge tait forms on each of three test 
oco-asions during the school year (one in reading/literature^ one in a 
content area)^ ^^ith a dlffferant set of foms used on each test occasion* 
TOe outlines of this das rflown in Figure 6*3 by identify it^ emch" tsst 

occasion with a C* 

At least two addlttomi testing sessions will be required of each 
student In the study SOTpl#., At the pretest and posttest occasions 
identified by W*s In FigwM 6»3, aach student will receive on each occasion 
a different test fom that constitutes an alternate measure of literal 
comprehension. These test foms will be assonbled from the wh-item pool 
using a design iinllar to that described in Chapter V, except that the 
passage independent sectlom of the test will be replaced by main idea and 
title Items* (For each of the 300 wh-itera passages^ there are available 
up to 4 verbatm or derived raain idea and title itws)* Additional alterna** 
tive measures of literal coittprehension based on the paraphrase transformation 
and interviews will also aininistered to small subsatLples of the study 
pppulation* 

thm foregoing design will require approximately 120 minutes of testing 
tlini per student on the protest and post terr.t iiccasions, arid 80 rolnutes at 
the interto data point* this elment of the study design will provide a 
basis for estinatlng the mmm and variance of literal comprehension change 
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Figuri i3, Loflgitudinal and oross-^sectional componenta of the test adniiniatration for raajor dipindip wlibles 
in the onef ear wlidation study. 
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scores acrose the eebooL year by grade Imv&l^n ^M^mm all such e^fciwfttea^ 
are refarenced to the mmim scalsp it shoold .pmm'^ p^^Bihlm to esfctolftta 
the total passible chatige across the geale bjr #Mifc4^1^ combining >^mm^ 
Sect ions or grade ievels* The variatioTi in oht^g^ m^^vmB within $mdm 
lavals may also be partitioned to determine th^d^ c^^nterS-butions of Mh^ot 
and family factors to the davelopmant of llt&^^l ttompVahenslon* TJiiS nmnm 
analytical process can also be followed for tb.# ^tai^ardimad reading tmm 
scores available on the grade 2«9 student popwl^^^ion, t^ere pretext ^rtd 
posttesc scores will be available for eaeh of the populations In th^#^ 

.grade.. levels*!^ .............. - ............. - ............. -..--^ .- 

Many of the details of the foregoing de^tgrR y^fe need to be gat* 
Some alOTents of the design roust be restrletei t;0 TOsll oross^saetlori^ 
the student population due to the costs of Amvf^l^ptrt$ adequate Ins^tytWii^nfc^^ 
tlon# For e^cample^ the meaeures of syntactte wwpl^l^tty and eomp^C^lW^ 
will very llkaly be' restricted to the eleiTianfeJil^ ^i^^^s beeause ot th^ 
necessity of validating the measures to be <4th#yp 1975| Flnn^ A9^1S) 

However, with these exceptions. It is e^^eet^iS tth^fc ^^e basic outli^l^^ Cif 
thtm d:eBig« ^h^i installed with the result teb^ft 4mtm. will be aWilJ^l# K^n 

mar^ intarestirig questions about the psyeholoii^^l meaning of the aqn#tTO^tj^ 
literal comprehenslonp and of soma of Its op#v^^lc>mllaations^ Imivtdl^g btt^ 
clo^a format. Because of tome of tha unSlqua C^aftW^^ of this depigPift ^vg^y. 



^The CAI offers an ADSS scale score whloh allo^* the aamh±n&.%to^n of 
teat forms and levels on one scale, thus msktu$ th# analysis pos^ibl^ M^0m 
a psychDmetric point of view. However, tmlSkm t^hm ^tomm test, th*fe co^tmtm 
and fomat of the CAT ehangas radically Svom W level* It ^owW pm%t^ 

interesting, howevar, to determine how much th^ ^fead;#nt population , 
on this scale and how much of this change is mm^^im^md with schpol l^iu#Tm^@, 

^Personal communications on this Issue in^flei^te^ %n^mnm±v& costs £w 
developing Inst rwnentat ion across grades m^mmXf some moMtB m:lm 

for use In grades l-fi* 
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COTinQti Item fomats adWiini^-efm^ at each level of population and a 
single underlyitig scale to ^i^h t#st scores can be differenced, the results 
of this study should be of hm^4 interest to the prol#®sional reading 
coraraunity. 

GeneratjtoM.,.gSd^^U Items and T^j^ts 

Concurrent with the arnvm ol test validation, iiS^ntif led in Figure 
6#1 at the outset of this ^U^ptWp efforts %?H1 contlme to toprove the 
reproducibility and es^ort^biltty of the item and te^^ generation procedures* 
Also^ this sactlon of the m$mwh program will prodw^ material that will 
help .potential users apply Jjto AtWrn testii^ materiaXa. ta hmM 
probl^s^ ranging from indi^Mu^l imaaiurOTent problOTia to large-*scale 
evaluations of reading pm%t^mB ^^htn and between aohool districts. This 
final section of the present ahi^pft^r briefly discuii^i the overall content 
of emh of these effort s* 

Test It^ Generation . . 

The teat item genaratiaii pTO'ft^ii is now largely a continuous, hiBnan 
operation that begins with tha ^TOpling of passages Stom sources and 
concludes with a fini^ed pa$$ii$^ in the multiple«Gholoe £omat# Aside 
from the aanpling of original pmm$mB^ it appears that thift fnira routine 
can be computsr programmed interactive mode, ^ith the laborioui and 

repetitive coinponents of Itro i^nei^atlon handled entimly by the computer. 
For exanplej^ following the ©TOplirtg of a passage, th^ computer would analyse 
passag© readability and idwtWy i^^rlous potential d«^X%tlon patterns ^ile 
also identifying the charsqfe^rl^ttes of each deletion pattern (percent of 
text deleted, percent of noms 4#l^tedj etc*)# The pwson Interacting with 
the computer would then indte^te^ a particular pattern of deletion, would 
further Indicate the word l%^t$ W b^ accessed for mmh deletion, and would 
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finaU'Y Is; p^tesented with a tentative clostid passage. The resultant cloae 
P^, ' - :^^-c. W':? = ld be inspected for any departures from the deletion rulee, word 
lists would be further accessed as needed^ and, finally, the completed 
passage would be programned for type style^ letter size, layout, etc* 

This projected automation of the item generation routine will make 
tha process of st i£^ developraent generally export^blep This activity 
mtist be carefully integrated wich the research planned on the validity of 
the itan fomats 
Test Assembly 

The test assembly process is yet relatively crude, consisting largely 
of assembling the passages relevant to a given evaluation design into tests 
and delivering the associated paper-" punched tapes to the printer^ Over .the 
course of the research, it is eKpected that:: the test assrably process will 
also be progr^^raned in the interactive mode to operate on the passage pool 
and enable an evaluator or researcher to assanble a test or tests for a 
particular pui^ose* 

Thl^ process first requires that the pool of passages exist in a computer 
filey along with the daL3 that become the select Icn criteria of the user* 
Relevant data for passages include Rasch calibrations^ readability 
characteristics, content area, psychometric characteristics, and so on* 
The process of passage selection must allow the user to apply several 
selection criteria slmultaTieously, while also providing for various 
sampling strategies. The progrmning should be sufficiently sophisticated 
so that a test or tests can be generated with predictable evaluative and 
psychometric characteristics. For exmple, the proiram should be able to 
deliver n parallel tests, with specified content, of fixed length^ and with a 
projected mean, standard deviation, and reliability in the population* 
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This section of the research program will also pifrovlde a set of 
ptactical models or guidelines for using the cloze corapon^nt of the TDN 
in evaluation and research* Of particular interest is tha derivation and 
effective presentation of workable applications of rrj.ftrix SOTplingj using 
the cio^e passages as a resource. Sirotnick (1974) attempted to provide 
such models in a general presentation^ but here it would be expected that 
a ntmiber of detailed models could be derived and applied in simulated use 
of the clo^e passage pool* Sirotnick^ s presentation shovred how a school 
district aould save considerable anounts of testing time and money by 
applying TTu ^rix smpiing to a largo number of items and domains of contonta 
The flexibility of the clo^e component was planned largely so that the 
evaluator could begin to t^e ' antage of the economy and efficiency of 
the matriK sampling model in evaluatign in reading* 

Presmnably, in the finished product from this phase of the research^ 
the user would first e^^lore the stoulated evaluation models projected here# 
He would then specify the parameters of the evaluation design that fitted 
his situation (e*g«s number of test groups^ numbers of testSg sampling 
plan for each test, confidence levels, etc.). Armed with these paraneters^ 
he could then use the test assembly program to generate the required tests 
in paper^punched tape form. 
Decision^Making Utility of the Test 

This component of the practical side of the research will go beyond the 
processes of assonbling items and tast^? to show the user how the test data 
may be used in certain types of practical decision making* The decisions 
to be addressed are largely of the placanent type. The focus is on 
assigning an individual or group to reading materials or to levels of the 
curriculum. The basic problCTa to be addressed Involves creating the 
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technical guidelines and techniques that will allow a user to ^eneraliEa 
from a cloae test score (in Rasch calibrations) Co a sepnent of one or more 
domain distributions of readability.. The problem is illustrated in part 
in Figure 6i.4* For each grade levels there is a ribution of readability 

of Che written material in each content domain and subcategory. Generally^ 
the user wants to know how to assign a student to sections of a set of such 
distributions so that the student can be given material he can actually 
read* 




Passage Readability Level 

Figure 6.4 Hypothetical distribution of passage readability for 
one content area by grade level 

As part of the researchi the smplii^ distributions of readability 
scores of the domains and subcategories of content included in the cloae 
:est will be known^ as ' the distributions of readability scores for 
passages with the same .a sc.: calibration. The problen is to work out the 
technical parmieters of using botL jets of data to pr :_^ict the range of 
readability that is appropriate for an individual with a given score on 
the cloze test. By general i^ation^ the resultant model should further show 
how well any given group -'fits-' the domains and categories of reading 
materials smpled by the test, ^ 

In simana3^5 this aspect of the applied ccHP.ponent of the research will 

^Included here is the problem of defining statistically the cutoffs 
that indicate the point of comp^^^hension /no comprehension in a student's 
test protocol* These cutoffs may vary by age or grade* level* 
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dtri /2 practical procaduriis for estiinatlng the probable level of functioning 
(in terns of literal comprehsnslon) of an individua' group in relation 
to a given domain or category of written discourse* It would also be of 
considerabla value, if resources a 1 lowed, to extend the model to bIiow a 
local user how he could take into account the readability characteristics 
of his own populations of students and reading materials to achieve a 
generally iinprovad match between student abilities and instructac: nal 
material Se 

ConcluF ion 

The foregoing discussion is an expansive outline of some of the studies 
that might be conducted to exanine the validity of the mult iple-'Cho ice cloze 
format as a measure of literal comprehensions The piOjected studies are 
east in the scientific fmuework of consimct validity* That is, because 
of the Implications of meaning and use attached to the clo^e testing 
TnaterialSj it is denned necuiVisa]^ to embed the test in a research progrmn 
that will tend to clarify both the meaning of the concept underlying the 
test and the utility of the "est in measuring the conccT i:* Typically, 
content-oriented or achievement tests are not embedded in such a research 
progran, but are considered valid on the grounds of content and convergent 
val Idity* 

The validation progran defined here projects the gathering of evidence 
of different types of validity on the cloge tests The content validation 
phase is a much more extensive effort than is usually mounted with tests 
of achievement. Here, the research program will attempt to characteri^a 
the content domains to which the test is referenced in considerable detail* 
This effort will ultimately provide an improved basis for using a reading 
test score, based on the multipla^cholce cloze, in decisions that directly 
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affect how individuals and groups are assigned to domains of reading 
instructiotb 

The construct validation phases of the proposed research are concerned 
with evaluat;' the tnain affects and interactions of organlanic and 
situational variables on the development of literal COTiprehension, as 
measured by the cloze exercises^ The quasi**longitudinal component of this 
research on test validity further provides a basis for exaffnlning the 
theoretical and practical importance of the construct o£ literal compter 
hension««and the clo^e exercises as an effective measure of the couSi,;ract«- 
across the years of public schooling* The potential outcomes of c is 
research would appear to provide the kind of information that is needed to 
determine the utility of the cloze exercises as an toportatit measur'3 of 
output In reading instruction* 

Finally^ concurrent with the validity studies planned for the clo^e 
testing material Sj a progran will be conducted to gradually transfom the 
tesr into a state of practical utility* This progrmi will make extensive 
u...a of modem technological developments with the Intent of Impro'ving the 
ecoiiomy, usability^ and applicability .of th^ testing materials* These 
applications constitute a model for future tests^ which will not be tests 1 
the usual sense^ but devices Chat can be tailored to measurement/evaluation 
situations as naededa 
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CALIBRATION ATO SCALING BESiaN 

In addition to justifinble construct validity^ the new multiple^choiGe 
clos^ testiriB system will have a very useful ecale for score interpretation. 
The new test scale will have distinct advantages over the scales of commer- 
cially available, standardized tests. The scale for the multiple-choice 
close (MCC) testing system will have equal intervals and a ^«low-^difficulty 
point" near zero. These properties alone will support the legiti^^^T;^ assess- 
ment of literal corrtprehenaion over time while permitting the use of 
unique teat forms at each point in time. The readir;,^ passages cpmprising 
this scale will be drawn from eleven contend domains and cover difficulty 
levels from first grade to college. The approach to be uB^i in scaling 
these reading passages will allow the estimation of r-aniiir fibility in all 
11 domains from one test in one domain. Thus, the most useful application 
of the scale may be the construction of tests that are tailored to individual 
students, A teacher, working with the Test Development Motebook, will be 
able to select passages that will be targeted around an individual's true 
ability, A test so desipied will provide a precise assessment of a partic- 
ular student's ability in literal comprehension. 

Actually ^ a teacher using the Test Development Notebook could construct 
unique tests for each student in a classroom, several times throughout a 
course of instruction or an academic year. This would provide a desipi for 
achievement monitoring that is seldom used in schools today, 
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Trait Definition 

The first step in the development a^d calibration of a test scale is 
the specification of the trait under investigation. That ts^ what student 
trait is actually measured by the MCC test? 

In the early phases of trait definition, singular operational defini- 
tions are counter-productive, ''for the closure that strict definition con- 
sists in is not a precondition of scientific inquiry but its culmination" 
(Kaplari, 196^, p. 11), The veracity of the trait-definition being investi- 
gated, namely, literal Gomprehension, is open to strong criticism, especially 
when conceived of operationally (see Chapter II in this prapnsal)i. 

Operationally, we might say that literal comprehension is ex-ctly what 
the MCC test measures. Unfortunately, one inspection of a multiple-choice 
cloze test could result in several different interpretations of what the 
test might measure. In addition, if only an operational definition is pro- 
vided, then the burden of construct validation is thrust upon the consumer, 
"who will inevitably make inferences beyond the universe of situations repre- 
sentatively sampled by the test" (Cronbach, 197% P- ^Sj). Thus, trait defi- 
nition is tied directly to construct validation because users will demajid, 
and legitimately sc , hhafc ^ test, if properly used, measure what the devel- 
opers say it measures. 

In order to maintain the desired interpretations of the MCC test, con» 
struct validation (see Chapter VI) must be designed carefully in order to 
refute ajiy substantive counter-interpretations of the ^at that night a:fise 
from it© use* This construct validation is actually a clarification and 
justification of the operationalisation chosen for measuring literal compre- 
hension* In order to be received properly, however, a particular operation- 
aligation should have a firm conceptual basiB, especially in the behavioral 
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scierices. The conceptual basis for the trait meaBUred by the multiple-choice 
cloze test is explained in detail in Chapters II throu^i IV of the present 
proposal* 

Measurement Issues a^d Model 

There are several measurement issues involved in the calibration of 
the MCC test. These measurement issues arise from two sources s (1) the MGG 
test format and (2) the requirements of the measurement model used to cali- 
brate the test. Generally speaking, calibration means estimating item dif- 
ficulties so that items can be scaled from the least difficult to the most 
difficult* Hovwer, in the present context, the emphasis must shift to pas- 
sage calibration. 

The format of the MCC test is radigally different from that of conven- 
tional tests of literal comprehension. As described earlier in this proposal, 
no formal questions are asked in the MCC test. The student is simply required 
to choose from three, four, or five alternatives that word that has been 
deleted from the paragraph in question. The student's "^^Hty to reconstruct 
the original paragraph reflects apprehension of the x meaning of the 

paragraph. The manifestation of this trait is Cs^rlsi^U to be an all-or-^ 
none phenomenon i that iSt apprehension occurs or does not on a specific pas- 
sage. Thus, the test format remains essentiW^ly the same from grade 1 
through college. 

The muitiple»choice clo^e format with no formal questions will reduce 
the importance of general intellectual ^ills in the student's response. 
The format is desipied to measure literal comprehension of a passage, not 
the student's ability to coffiprehend and answer questions following the pas- 
sage. In the latter instance^ ^ills beyond literal comprehension are called 
into play. Thus^ a major research question that must be ajiswerad is whether 
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or not the MGC test is urd dimensional and thus measures a stable trait 
across time, 

A second complicatinjs f^^nbor in tih^ mO test was the choice to dslete 
only nouns, verbs, adjectives. . . -^^ovhiu Thfi affsat of this decision on 
the perforraanc© characteristico oi the MCC te^t rnuot Blm b^^ investigated.^ 
This choicR may complicate the attempt tf- Mintain ^. unidimensiona measure 
of litoral comprehension due to a lock or systematic variation among the 
deleted words* 

The Rasch measurement model will be used to analyze and calibrate the 
MGC test* This model has been chosen for two major reasons* When tests 
have been constructed so as to meet certain specifications, '^application of 
the Rasch model gives person-frea item calibrations and item-^free person 
measurements^' (Wright and Mead, 1975. 2)- S^^h objectivity in measuremant 
is seldom attained in the behavioral sciences* For example, if you want to 
know a person's h^'-^^ , you measure him with a yardstick or another device. 
Within reason, two different yardsticks will provide the same estimated 
height. What happens when students are given two reading tests desiffied by 
separate companies? Does one consistently get the same estimate of a student's 
reading ability with the separate tests? 

The Rasch model specifies a par^::icular simple relationship between 
person ability, item difficulty, and the probability of obstirving 
a correct response* The implications of this specification are 
that : 

1) the variable measured is unidimensiQnal 
:i) there are no strong relationdiips among persons or 
items other than those ^ecified by the model so 
that responses of persons to items are stochastically 
independent given their parameters in the model 
3) items "^d persons do not differ substantially with 

respect to other possible response factors not repre- 
sented in the model such as item discrimination, 
person sensitivity, guessing or indifference* (Wright 
and Mead, 1975* P- 2) 
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Thua, it will be necessary to analyze all of the response data collected 

1 

using the MGG terit* A compiiter prograjn is available for these analyses. 
Following analysis^ if the test data provido persoti-free item calibration 
and item-free periion measurement, then the three ^ecifications of the model 
must have been met by the original desipi of the test. These analyses will 
provide the calibratiDn data needed for equating test difficulty levels and 
test content domains from grade 1 throu^ college. 
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''cAliFIT t Sa mple ^fre^ ituu: oalibraticn ^ at h a Rasch measurement modal. 
by Benjamin Wright and Ronid'^Mead/ Gtatlrc^^ Department of 

Education, The University of Chicago, Chicago, Illinois, 1975. Note that 
this program is now operational at the Stat^ Educatiori Department in Albany^ 
New York, 

%or details of the Rasch model, refer to Georg Rasch, Probabilistic 
models for some intell igence and attainment tests^ Copenhagen? Mielsen & 
Lydiche , 1960| ¥en jamir Wright, Sample-free teat calibration and peraon 
measurement , Pr^^ceedln^s of the 1967 Invitatior -^ Conferanoe on Testing 
Pro blems^ Princeton, ri.J, t Educational Testing Service, 1965, pp. 85-101 • 
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Motivation for Using the Rasch Model 
Fifty years ago Thorndike cofflplained that contemporary intelligence 
tests failed to specify "hov; far it is proper to add, subtract, multiply^ 
divide, and compute ratios with the measures obtained*' (Thorndike, 1926, 
p, 1), He asserted that a good measurement of ability would be one "on 
which zero will represent just not any of the ability in question, and 1, 2, 
3, if, and so on will represent amounts increasing by a Gonsrv^^ difference ^» 
(Thorndike, 1926, p* k) . 

Thorndike had the courage to complain because he believed he had worked 
out a aolution to the problem ■ i i own intelligence test. So did Thur- 
stone (1923)* 

Thurstone's method was to ..^.mslate the proportion in an age group 
passing any item into a unit normal davjate and to use these values as the 
basis for scaling. Common scale values for different age groups were 
obtained by assuming a linear relation^ip between the different scale values 
of overlapping items and using the different group means and standard devia- 
ticns as the parameters for a transformation onto a common scale, 

Thurstone redid a piece of Thorndike (actually Trabue's) work to show 
that the Thurstone method was superior (Thurstone, 191^?). But the methods 
are essentially the same and they ^are similar shortcomings. 

Thurstone' s "absolute scale" (1925i P- ^38| 1937, pp- 518-19) yields an 
interval scale measurement of a kind. But no useful inte3T3retation of the 
"equal" scale units has ever been proposed. 

In addition to item homogeneity, the Thurstone method requires the 
assiaiTiption ^hat ability is normally distributed with age ^oups and that 

'This section and the one that follows on "Application of the Rasch 
Model" were written by Dr. Benjamin Wright and Ronald Mead at the Department 
of Education, Univeraity of Chicago, Chicago, Illinois, 1975* 
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there exist stable parameters for these distributions • Should the sampling 
of intended populations be biased, so ^b-^ scale values. They cannot be 

invariant to sampling. In particulari sq^;'^ /:^ afferent in ability will 
produce scale values different in magftitude and dispersion. 

Thurstone used the 1925 version of his method for the rest of his scal- 
ing life (e.g., Thurstone, 19^7), but the Mjority of test calibratc.r3 i^ave 
relied on the simpler techniques of percentile ranks and starmruu urns. 
The inadequacies of these methods were clarifio-i by Loevinger's 19^7 analysis 
of the construction and evaluation of tests of ability (Loevingerj 19Wf p* 
k2). 

Loevinger showed that test homogeneity and scale rnonQtonicity were 
essential criteria for adequate measurement* In addition, 'An acceptable 
method of scaling must result in a derived scale which is independent of the 
original scale and of th.; original groTO tested " (Loevinger, 19^7f p. ^^6). 

Summing up the test calibration situation in W^7, Loevinger says "No 
system of scaling has been proved adequate by the criteria proposed here, 
though these criteria correopond to the claims made by Thurstone^s system" ^ 
(Loevinger^ 19^7, p- ^^3)* As for reliabilities based on correlations, 
^'tJntil an adequate system of scaling is founds tha correlation ^^etween tests 
of abilities, ev^n between two tests of the same ability, will be accidental 
to an unkno^^m decree " (Loevi nger, 19^7| p. ^^6). 

Twenty-fivf^ years ago Gulliksen conclude^^ i^is Theory of Mental Test s 
(1950) with the following observation; 

Eelatively little experimental or theoretical work has 
been done on the effect of group changes on item parameters. 
If we assume tha-*. a given item requires a certain ability 
(a), the proportion of a group answering that item correctly 
will increase and decrease as the ability level of the 
group changes^ The amount of this change will be greater 
for an item that is highly cori elated with ability A than 
for one that correlates only moder .tel.;^ with ability A. 
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If we have some atandard measure of ability A, it may be 
that the ability, level at which 50 perGent pass and 50 
percent fail would not be subject to as much fluctuation 
as the proportion of correct responses* As yet there has 
been no systematic theoretical treatment of measures of 
item difficulty diracted particularly toward determining 
the nature of their variation with respect to changes in 
group ability. Neither has the experimental work on item 
anaJ.ysis been directed toward determining the relative 
invariance of item parameters with systematic changes in 
the ability level of the ^oup tested, (Gulliksen, 1950| 
pp. 592-93) 

At the 1953 E.T,S* Invitational Conference on Testing Problems^ Tucker 
suggested that "An ideal test may be conceived as one for which the informa- 
tion transmitted by each of the possible scaled scores represents a location 
on some unitary continuum so that mLiform differences between scaled scores 
correspond to uniform differences between test performances for all scare 
levels (Tucker ^ 1955? P* 27). He also proprmed the comparison of groups 
differing in ability as a strong method for vHTiluating test homogeneity 
(Tucker, 1953, P* 25), But the other parti cXoants in the conference belit- 
tled his proposals as impractically idealist Ic, 

Fifteen years ago Angoff wrote in aji ^encyclopedia article on measure- 
mant and scaling: 

M, .;t of the test scales now in use derive their systems 
o^^ aiiit from data taken from actual test administrations, 
^:nd thus are dspendent on the performance of the groups 
tasted* When so constructed, the scale has meaning only 
so long as the group is well defined and has meaning i and 
bears a resemblance in some fashion to the groups or 
individuals who later take the test for the particular 
purposes of selection, guidance, or group evaluation. 
However, if it is found that the sampling for the devel- 
opment of a test scale has been adequate, or that the 
group on which the test has been soaled has outlived its 
UBefulness, possibly because of chajnges in the defined 
population or because of changes in educational emphases, 
then the scale itself comes into question, ^his is a 
serious matter, A test which is to have continued use- 
fulness must have a scale which does not change with 
the times, which will permit acquaintance and familiar- 
ity with the system of units, and which will permit an 
acciimulation of data for historical comparisons, 
(Angoff, i960, p, 815) 
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And yet the faulted methods refL?rred to and uriticized by Loeiringer, 
Gulliksen and Angoff are still widely used in test construction and measure- 
ment in spite of the fact that considerable evidanee has accumulated in the 
past fifteen years '^^.-it much better methods are available and practicra. 

The new attack cn mental measurement v/as first formulated nearly 
twenty-five years ago by a Dani^ mathematician, G&org Rasch* Hasch began 
his work on psychological measurement in 19^5 when he standardized a group 
intelligence test for the Danish Department of Defense. It was in carrying 
out that item analysis that he first '^became aware of the problem of defin- 
ing the difficulty of an item independently of the population Md the ability 
of an individual independently of whici . hn has actually solved 
(Raschi 1960^ viii). By 195? ^ "d It -4 down the ba.sic foundations for his 
new psychometrics and worked out two probability models for the analysis of 
oral reading tests* In 1955 he reanalyzed the intelligence test data and 
di^velopad the essentials of a probability model for item analysis* 

Rasch first published his concern about the problem of i^ample dependent 
estimates in his 1953 article on simultaneous factor analyst in several pop» 
ulations (Rasch, 1953)* But his work on item analysis wa^ imknown in this 
country until the spring of 196O when he visited Chicago for three months, 
gave a paper at the Berkeley Symposium on Mathematical Statistics (Hasoh, 
196^), and |:ubXished a book, Probaollisbic Model for Some Intelligence and 
Attainme nt Te^ts (Rasch, 19G0)^ 

These publications contain a detailed presentation and application of 
a probability model for the analysis of psychological tesr data (Rasch, 
i960, pp* 73-'79l pp. 107-125I pp. 168^132| 156% ^he applica^ 

tion of the model yis^ld^ meaBuremente which satisfy -^^s and Tucker's 

criteria and resolve the p. ^blems outlined by Gulliksen Angoff* But 
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unfortun^atelir Q^m arter 196O no=fc rfl^njr sQCieL eeieTL^is^s leamid of Hascli's 
woffk* . . - Rascli's baolc , piiblisliii irm Danmaric ^ r^ia.ched ^nl^^ m handful of 
BohoLar^ ill tits countrf , Mm wcrte h^s amcisLl imp^icstLoas for the future 
of peiycttonietrlcs arid ior mmw^^Bt%i l-n so^daL aol^aiPt r©searah In general - 

Of tlie 1950 taool^j □uciei' ssys 'Tho mcpM^aph Kasch presents sitrgral 
ve^y Inter est Ing acmd quiie sophisticated de^&L oprfimis dn fflathematical test 
theory ''^ (^u^ier-) ^963i J*. 3S6)* Of fehe it^m anslr^i© m©del Sit greaves Bays 
"tfce liiodLel prcp^sed o^d ih^ qnaes-tio^e that it rais& 6 a-3^e wtremely tnten^t- 
itig, 0%rBf^BLl% the a.iatbtor has made a BUtst^t itl cent a'ltiUtloii to model 
badldiiag ill tests of abLaifcy ^' <Sitgrea^es, 1 963^ 3* 220), Cosmbs says 
that Ranch's mrrl is a '^TOjor cositribiatian mcL a n^i^ approach la psychornet- 
rice whj-oh is w^rttay of i^try s8^3oae study " (Cooitile, 196^^1 238)* 

Irm her discuss! ca of persoB popuL^tton as psarchometric eonoeptSj 



Basch (ig6a) h^as demised a -tru3.y mvr approacl to psychci- 
ri^ pa^^eWaifl a Ke utakes ^Bm of of the clssel- 

eaL psyctaomctrios, tut rs^ther applies algabr^ anev to a 
graTjatoilist Ic Miel • dhe- prDhabLlity thaLt a person will 
©nOTer an Iteri C(^rr©at3.y is asmr^id to b© th.^ produat of 
sin abtir^y jarametes' pert airiig c?iily" ffi tie person mA a 
"difficulty ]pairaiti©ter p«rfcairain.g OTiXy^ to tie item. Beyond 
sp&cifyiiag mm person as the stara^dard of ability and oiie 
it&n ms ttie St aniard iifflciulfcy^ the ^MLity aasi^ed 
to an indirldi^al is in^de^endem^: c^f that of c»iher raertibers 
of th^ gTDijLj mJii of ^h* particiaLp^r 1-t er^a mt3i which he 
da te^tedj mirLil^rl^ fcr the i^eia dtfflpuJty^ todeed* 
th&se two pmp ertits vcr^ ons& :at3=gg&at©d as ^riteris for 
absoliate i^aliiig (lo&m^geiTj 19^); at tMt time propoied 
iscliefnes for ato^sotute scaUng had Bot bees^ Bh^wc% to satisfy 
the criterl©! nor Iqis Q^ttffiaiL scalLng , Thiis, Raech 

mst tot cr&clit id witli an outstanilng cont rib"uti.oa to one 
of the tif^o cao^trpil issy^chOTetri c proWtTOp th-e achievement 
mt noEmartoitMry nrit&mr«s* Basch is coiio^me^ with m 
different aad niort ligcr^us klmi of gtngrall^atipii than 
CroJibach , RajsLTttfta^f smdE Qliser* V*lit« Ws model fits, 
the reiulte an ia4€p©iid^nt of tl%e sampLp of prnmotim and 
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oi the partlculax it mB vtfithia some broad limits- Within 
thmm limitst generality is^ one nitght say , compl&te, 

VsHien a n^w raethol for salving old pro'blems is propoiecLt aai important 
question is. Does it work in. practice? 

Application^ of the Raech Model 

In his I9S0 book, Baach doGuments at lenph the application of his 
model to a fouy-teet intelligence battery used by the Daaish army (lasch, 
1960^ pp, 80-107)* Tlie model is plainly iaappropriate ior two of the testa 
but a good fit to the other tv/o* In subseq^ueat, but still unreported analy^ 
ses of these data^ Raach v/as able to tmek down a test admlal strati on factor 
caiAsing one of these tests not to fit the model and to show that upon adjiist-- 
merit for this faotor, the residual data of this tast alao fit* 

Brooks a^d Blomrners (I9S5D applied Easch's modeL to lorge-Thorndike 
Intelligeiice rests administered to eipith and taath graders* Their pur-- 
pose was to ev^aluate the stability of test item paraineters wheri given to 
groups of different ability. They fouiid that the model fit the lorge-^horn- 
dike data rather well and that the eetimates of item diificulty were stable. 

Since then there hav^e been a series of appllca-ttoas cnade in Demark by 
Hasah's studeata. For e^csunpla, AttCersen <196^) applied a niultiple^reBponse 
generali^atioa of the model to an attitude inventory administered to Danieh 
racruiti. He ms able to diow that his original test ocntaiiied two hoinoge.^ 
neous euhseta of itemi whichj when idtmtifted and IsoLated, iaoh ia them- 
selves fit the model well«^ Anderien's subsequent work on the matheniatiaal 
and Btatistieal a^ects of the mod^l has been txtefisivt CBae Andersen's refer 
eaces) * 

In 1968, Wright applied the inodel to k& readiag ccraprehensioa itemg 
oa the Law School Admiaslon Tm&t* He demonstrated the sampli-freeaess of 
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the ealitratione by estimatiag item dif ficultita separately for the liipiest 
and lowest Bcoriiig gxoups.. Since the dilficulty estiniates tased on the 
extrmme groiaps vfere statistically equivalent t he liad shown that the estitnates 
were indepeiideiifc of the ability of the persons in the caJ.ibmtioa sarrtpl^ and 
GDuld be eajaly used over the entire Tmngm of ability # This method of dem- 
onstratiiig the practiaal utility of the modal lias heeii eiiccessfully applied 
on numerous Dcaseioas (e,gi, on the more than JO different sets of test data 
hrou^t jartioipaiits to the A^A PreseSBioiis on the Rascli model held in 
1969, 1970, and 1975). 

Dumvie (1970) reported the suecesaful ajpliaatloa of the Raseh wodel 
to teat development for the Wew York State iDepartnieat of Civil Service • 
Ha found it especially useful for ideiitifyiiig jooi items ^ In the several 
examples of aptitude and aohievement type tests that he has inveatigatedf 
iteras identified as misfitting were easily reoogiil^ed euhsequently as defec- 
tive for clear-GUt substaiitlve reaaonLS. The bad Items typically required 
types of behavior or specif io prior knowledge not needed for other itenia* 

The Ainerioan Guidance Serviee has lised tlie i^odel iii their test construc- 
tion woric since 1970* Two of their testfi^ KEYM4TH (Coniially* Nachtman, and 
Pritchettt 1971) and the Woodcock Readiag Mastery Teat (Woodcock, 1974-) were 
entirely built on Easeh principles* Thia involved not ojaly the select ioa 
and calibration of items but also the devalcpmerit of recording forme which 
relate the tested person's estimated ability # in a orlterion v/ay,. to the 
^ecifle skills and daficienoies he has and, iti a norrnative way^ to hie grade 
level . 

Willrnott and Powlee (197^) have also reported oxtensive applicatioii of 
the rnodel in England* In oormection with the Sixteen PIva ]&camining Project 
at the Natiorial Fouadation for Educational Eesesirch of togland and Walea, 



194 

7-12 



they applied it succegafiiLljf to^^bests of reading ability, anglish compreheii-- 
sdott, geographyi gaience, mathematics and physics. V^ile discussing ques-? 
tiofts of chainijig items and building item pools, their efflphesis was on fit 
to the model* They concluded that while to obtain the ma^innim benefits of 
the model it is necessary to taka the trouble to construct a hornogeneoUB 
eat of iteniSf this was not mandatory in order to \xb& the model to obtain 
measiarenients far better than thoee ordinarily amilable* 

Spada and Jtsoher (1973) used the framework of the logistic latent 
trait model for a scientific analysis of a projeetivi inkblot test, They 
were able to formulate the ^eoific (and conflicting) models of personality 
inplied by the coding and scDring rtales of the Sorschach and Holtsman testa* 
In aj::^ empirical test on 350 RorschaGh and 3O5 Holtsnian protocolei the HoltE- 
inaii approach to sooringt which oorre^ponds to tliat required <by the Rasch 
mociali was found to represent the data more adequately than the Horsohaeh 
scorings The teets of fit were foiind ueeful in identifying niisfitting ink- 
biota md in modifying them to provolce more iGterpretabla rei^onseB. 

Bashaw (197*^) Md Bmrvtz C197^) have completed a sucoessful Raech equat- 
ing of Bmvm reading teets used in the National Anchor Test Study by oalibrat 
lag all items on all iomm and all levels of each teat on a common aoale. 
Thoii* results are eescjiitially equivalent to the far more costly s^d avrkward 
metlicds employed by WIS In the "official" equating but required half as much 
data, a third as much tirae and one t&iith the prcaeesing budget. This demon* 
strated draraatically th.e siraplioity aiid utility of the Raseh modal over 
alternative methods of test equating* 

Kifer amd Bramble (197^) used a final exMination conatructed around 
performance objectives to illustrate the modal* s application to criterion 
rafarenoed testing* 3hty discuss how to seliet items after the criterion la 
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set sad hov/ to ceritrol the two types of claesi It cation errors. The etandard 
BTroTs ct rteasurenitnt that the model providts for each ability estimate make 
it possible to ccrRpiite explicitly the probabilities that a person classifiid 
aa a ^'masfcer'^ ae1:ually lies below th© oritarioE and that a pireoa elaaaifted 
as a '*noa-Biast©r'' aotuaHy lies above. 

Reckase C19?5) used simulated data tD iaireatigate the utility of the 
Rasoli modil in ccanection with tailored testiag. Hie results iiidieatedj 
that with jeaaonable stopping rulai^ the eatiniated abilities oonTerged 
qtilckl^ to the tiue value (only eight or ten Items were required in soma 
caeea)* also found that badly off-target tests produeed biaaed estima-ues- 
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MGC Item Gallbration Usirif^ tho Rasch Modal 

^ht two previous sections In this report have supported the appllca* 
tioii of Rasch mGrisuremQiit rnodels ia educational test developnierit • During 
the sprins of 1975, thirty-six MCC test foras vore administered to 5|000 
urban atudents in srades 1 through 9- These MCC test forms were constructed 
iron paesages in tho Eeading/Litorature section of tliG Textual Domaia* All 
of theae MCC test forms have been analyzed using the Rasch measurement model* . 
A ocmplete description of a Rasch analysis for one test form v/ill provide a 
"basio for understariding Basch itoni/passage analysis, item/pasmge oolibra- 
tion^ and teat equating* 

risure 7*1 displays the distributi ^ oi subjects by total test scores 
'on Torrn 14 of the MCC te&t. This histogram is scaled to fill the page; thus, 
the modal score (i*e*| the nieasure that cccurs most frequently 121 the distri- 
'bution) is displayed as 100 percent. The modal value for Tom 14 is a Bcore 
of 51 out of a possible 60 indicating that th& ifth, 5th, and 6th graders v/ho 
■book Form 14 did quite well on it* 

Tigure 7.2 displays the number of subjects who answered each item cor* 
reotly and the proportion of correct resporises on each item. In Rasch tertn-* 
inoloEyt this is a distribution of item easiness* Inspection of Figure 7^2 
with horizontal lines drawn between items toT the silt differgnt passagos 
reveals the tendency for item easiness to change v/ith the different passages « 

Table 7»1 shows the results of the estimation process in the QklFlT 
confiputer procram as developed, by IVricht and Wead (1975)» The unconditional 
rnn^cimum likelihood proceduro was used for these estttnatea of item difficulty 

\ihen distributions ar© slcev;ed like tlie distribution in Figure 7*1 for 
MCC B'orm 14-, an .unconditional niLijcirnum liKlthood estimation routine is used 
in the CALFIT computer pro-am for Hasch analyses. This routine proNrides 
inorc 'aceurate estimates for ^ewed data than tha less expensive, but approK- 
imte, Qstimation routine. ^ 
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Figure 7,2. Raseh analysis of MCC Form 14: 

distribution of easinaes— eubjacts 
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Easch Analysis MCG Form 1^1 
Estimation of Item Difficulty and Group Ability 



: ^ 




- ---- 














"""" «^ 


FiHi'= il''^ 


r T 








rT^MDaRn 




T 






ST AN 




T 


vrvr 










J 
















_ * ^ ^ ^ « =^ ^ = - ^ 












1. 


! 




T 


-t , IciP 


: * 2 ^» 7 


-c. Jig 


T 


1 


"^^^ r I 


la 


» 


I 


L j T ? 


1 


.r? 






I 




Mi <a "7 
« ^ f 


% 74 




T 


1 ' 1 r -1 


T 


-1 . 1 ?3 






T 


.5 


E ^ 




u 


T 


rtih 


T 








T 






% 5^ 


r 


T 


r ! T ^ 


r 


-1 * iRn 






T 




^ *| if 


;* 


r- 


T 




T 


- ' , ^ p 




- : * r 7 


T 


fi 




i<r 




1 


' M 


T 


- i * '! 




- * : p 


1 


7 








I 


M ] • 


1 


-1 . f ■ 




- * ' 1 ^ 


J 




* ? « ^1 2 






T 


I - i 1 ''^ 


T 


^ ; . ! ^ ^■ 




- * : 1 1 


t 




3 w 1 




i : 


T 


r i r 


T 


- - .r-f 






T 








1 1 


T 




T 


^'^! 


" . M 


- " * fit ^ 


T 


11 




% 3r 


i V 


T 


r ^ r "* 


T 








T 


1? 






« ' 


J 


I f f 


T 




^. 


- e ^ 


1 


1 1 






i u 


1 




1 






. * ^fl 


I 


IL 


**^i 




1 


r 


r-^ r • 


I 








T 


1^ 




% 3T 


1 - 


T 


r r T ^' 


T 






- :,U2 


r 


ir^ 




'a 3? 




T 


r T ? 


T 




c* ^ ; n 


j:^ 


T 


1? 




C. 3? 


t M 


T 


f r n 


T 


, ^^n 




:n 


I 


18 




a 31 


\ ' 


r 


1 ; 


r 


i - 1 ^. 1 






I 






^. n 




T 


»' ' I . 


r 






: * j : 1 


T 


2: 


; * ^ : 


31 


2 1 


I 


1." T 3 


I 




% 1 1 6 




T 


21 


^ : fi a 


3: 




T 


^.ni 


I 








I 


22 




'a 3: 


y I 

y /, 


T 


t T 1 


T 




?r 




T 






:* 3: 


T 




T 






- r , 2 C 7 


T 




^ £i 


la 3: 


'.J ' 


r 


H,' r - 


f 




c * 2 ^ ^ 




T 


2^ 




:# !: 


? * 


I 


i:^ s r ^ 


I 


- : ^ ii 




- : , c n 


I 


2b 


^ • 3 


;* 3: 


r / 


T 


r 5 T? 


X 


* 1 : ' 




: : 5 


I 


27 




4, 3: 


Y V 


T 


^ T "1 


I 






-CpOIB ^ 


■ T' ^ 


26 




Oa 29 


r ') 


T 


p T'i 


r 




2^if^ 


^ :is^ ■ 


I 


^2<^ 


^Qs 1 : 


3a 2Q 


V 


T 




r 




C,277 ' 


:pm 


T 




^ '3 * 1 


29 


'i f 


1 


1^ ! 1 


t 








T 


31 


^ 7 


la 2^ 




1 




T 




^, i^^n 




T 


32 


Dif 16 


"a 3: 


' .1 


] 




I 




. p : : 


: * u ^ 


I 


3^ 


Is 2^ 


3; 


? 'i 


T 


r i, 


t 


^ . 1 I' 


J, 2 5 




r 






s 3^ 




I 




t 






* ;?i 


T 


3^ 




:# 3: 




T 


Iff 






L. 2^:i 


- c . 0^. r 


r 


3fj 


0 « 5 2 






T 




r 




? ^ a 




T 


77 




% 3: 


^ , 


T 


^^ n 


r 


. « " ' ■ 


:.2^n 




r 


36 


^ s F 'J 


2. 31 




r 


r V f n 


r 








T 


1'1 


^ » ^ s 


^. 31 


(i 


1 




r 




? . 2 1 1 




r 




R i ^ ^ 


Li J I 


j. I 


1 




T 


, , . r 7 


L. 21 1 


: * c : 1 


I 


^ 1 


0; <i 5 ^ 


31 


'} r 


T 


ft yo 


T 






- » J 1 ) 


I 










^ 


! ' 11 


t 


i * 1 1 


r * 1 7 




1 


L ^ 




W a J I. 




T 




T 


) • - n p 


t - 1 ^ B 


imCm 


I 






^ f J <J 


Mi 


T 


f 


1 


1 « ^i?r 


T - 1 ^ 




T 






*ja 3.3 




T 




I 


1 * n p 


2 '1 




I 






i 1^ 


J* 7 


T 


f^ - J / 


I 


? - " !t 




r ■ 1 3 1 


I 


kl 








J 




X 


s = * ' 


^ ■ t . !• 


^ n ^ r- 
: « y =. ' . , 


f 




is ?^ - 


ja 36 


I{ - 


T 




X 


' • ft 


» , . 

^ • ? r , 


. a ^ 1. 1 


T 




is B<^ 


37 




1 




T 




^ • 1^*4 




T 

_ L ^ ^ 




- 3 




'» 


t 


P ' T 1 


T 


1 i " t 














T 
T 


n 


I 

r 


? • : " ^* 


: 2 

C*1^7 




T 

I ^ 




-^^'^ ^,:¥2 


".*3 ■ 








T 


1. 


C* 1^"'7 


: a n 7 


T 






% ^fe 




T 


Hi 1 


I 






. * ■ 2 : 


I 










I 


«r T 
n> ! / 


I 




2 1 


0 a M : 


T 










I 

T 
T 


I 

T 




B 




r 






Ca ftl 




n- f R 




0. 2" 1 




T 










r ? t '1 


I 




C. ?1 1 




I 




i*f.7^ 




f 1 


T 


} ' 


T 




3 ' 1 


^aGll 


T '■■ 








































: a ^ 1 











and group ability, '^he information in the table is organised in two ssq- 
tions* The left side report e the it em tatlmation prooess* For each itemt 
its difficulty astiwate and the standard wror of this fstimate are given. 
Items are identified by sequence numberi wMeh is internal to a given run 
and would ahange if items were added or fi^lftad in other rims^ and by item 
name-, a four-= character alphameric i auppli#cl by the user* . . ^The right side 
of the table containe the relation betwwWi obaervable test score and the cor-* 
responding estimates of ability. The ^ability* column contains the estimate 
of ability implied by each possible scow (Wright and Mead, 1975f PP* 

The "Score CSroup^* mi "Group Ability''' columns in Table 7*1 indicate 
that a student who obtained a score of on thtse Form 14 items, as cali- 
brated, woiiLd be a^si^ed an ability of with a standard error of 0.33 
for that estimated ability* 

Table 7»2 ba^ns an analysis of tht fit of the data to the Rasch model 
For these test data, the students were aeparated into b±k groups, by score. 
»*The table contains ©ne row for each itani*^ identified both by inteiiial 
sequence nt^ber and user aupplied item nami* m . ■ The body of the table is 
in three seotionai The Itft six columna •aofttain the itein characteristic 
curvesi the proportion of subjects in ewh ^oup who answered each item cor- 
rectly* This should approicimate the shap<i of the logiatic ourvo for items 
that fit the models « . * The center si^ wlvamB contain the number of 
answers imescplained by the model. It im computed as the number of correct 
answers obeerved in ^ ^oup minus the nuffihw that would be predicted by the 

%he user name in the Basch analysis of the HOC Test indicates an 
item, in a particular passage on the test Sormi e.g., P111 ^ passage 1| 
item 1| WZik ^ passaKe 2, item 4| and so lorth* 
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Hasch Fit Analysis of HCO Forra 1^ 
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modeL © ^oiap of "tlii^l sis© aJild abili on m it em cf that dtJfioulty. k 
positive niuinbir indi&at© s fcha,t tfte group di d bpat*i£^ oa tbe item than m 
w^ialdi^iOt frc^ni th&ir jaarfoMajiiee^ ort tfcs otteir it tmg , A migatlve numbir 
iftdi^at^i thaul frhty idd "wo^st - t . . lEhe tb4ri sacbiott off the tatls eoatalnB 
s ifcati-itScp tor tirtlrvg the fi* mmti iieia ici each s^ore group. They arp 
ippripmni^tel^ dtitrib^te^ chi-sg^Mr*i atatt^tias ^itli one dagrce of fi'e&dDia 
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tk^ £it of iiLoh item mm ^ mil preupe* Sias € fc^ht d&m&^tioiia f£*om the modeL 
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p#ct^d ^aluia^ of oiaa, aa?d e^aii b& a^alwat^d -as P^^atdos v»rltli aaunserator dle^a^ 
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of thie piuniter o£ swbjcct s Ln eaeii g3*Qiap. Cls data TiiLcw the 0 table are thie 
mtan mA staadar^d derdatdo^ of tfet intriee in atah <oLmyntt» Under the hypo-' 
tkeieLe that tie modil fiis the data. , these la^a ajcp^et«d values of 1^0 for 
tha msarmi ani 1*^ for th-f etacdaxd dewattsatf* (Wright arid Mead, 1975^ p3» 1?^ 

Tp^ble 7 *5 coiitai»-e fit iarforanafeiofi in fclr^e iseqmietxcf serlaX ojrdejCt 
diJfLcuLty ortie^i ewi fii order * 'Hn eaeh ^ase -the liidQrtatiOT ^vaa i© i-teni 
diifLcuLty^ aai tiid^K of dtw discriiniaiatinB poi^iar aand *he ittni fit mean 
iq^are. * (Wrigtat an* M&ad, 1975 i "18) * T&e dtem/total point blserial 
Goi^rp-lat^ioaai is cmi the «^re3ns right | Ln fit or^ear, 

"ffindar ill© l&ti side of the tsibLa, tL^ m^trt a^d staJidard deviation are 
giweo- f^r itM iiff io%alt3i disst^imLaiafcio^t aal fit sieaE square^ Diff Ic^ty 
is*imt©^B fife GMtered a.i siroaEid ^i^crimiiiatdon iiidlai^ at am* TixQ fit 
mmn eqii^ar^ has an e^ec^tici mean ot ome ^ni aii escpected standg.rd divlation 
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of the square root of tvro over its degreee of freedom which will vslt^ between 
1 and 5 depelidirig on the nmher of soore groups defined by the ansLysis. The 
tlree correlatioiie of difficulty with discriniiMtioii, difficulty irfith fit meatt 
square and diicriminatiorii dlffieulty with fit mean sq^-uare and dlgcriiniiiatlQn 
Virlth fit mean square have giro es^ectations'^ (Wright Bud Maad| 1975? 19) • 

Jigurea 7.3f 7*^? 7*5? and 7*6 dispLasr the following plots i reW^i'et- 
ii;ely: <1) item z against the prohabllity of a persoa in an ability Qrm% 
amswering the item correctly, (2) item fit mean iquare against item difficulty, 
(3) item fit mean square against the inde^c of item discrimination , and (if) item 
discrimination index against item dtfficiilfcy* These figures aomplete the 
detailed analysis of the fit of the data on MCC Forin 1^ to the Haach model* 

By refereace to the third column in Tatle 7 #3, it will be observed that 
there are several Items that do not fit the Eaeoh model- Two of these items 
aje numbered 55 and 47. Their hi^ '^Pit Mem Square" indices suggest they 
are tiot operatiag as expected or like the other items. Item 55 required the 
studente to reconstruot the following senteneel 

Everyone was bargaining back and forth. 

The possible alternative ehoicea for the deleted word included s 

a . snu^Ly 
b« merrily 
c * loudly 
d * honestly 
e * painfully 

The correct answer was "lotidly." The altemiatives, according to desi^ spec- 
ificationsi were to have been semamtically iaplausible. In item 35i however^ 
'^merrily" and ^'honestly'' are both semantically plausible and there is nothing 
in th© paasagt to suggest that ''loTidly" was more appropriate than *»nierrily" 
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or even **honastly,** though it seems incongruous with "bargaining." This 
fault in the item undoubtedly forced it to operate at odde with the rest of 
the test. 

Item i|-7 required the students to reconstruct the following sentences 
Neither (race) pleased the gods » 

The alternative choices ineludeds 

a. obediently 

b. entirely 

c • miraeulously 

d. modestly 

e • incorapetently 

The explanation of the miefit of this item does not reside in the dis- 
tract or s, which fulfill the condition of semantic implausibility . Rather, 
the sentence upon which this item was based seems to have prestnted two pro- 
blems. First, it contained another item which was very difficiilti probably 
because the word deleted— race— is used in a somewhat unfamiliar way (i.ef, 
to refer to elves and dwarfs, the subj^ats of the pass^^e). Undoubtedly, 
the difficulty of this item contributed to the difficulty of the following, 
or misfitting, item. Further, it seems likely that a more, typical rendition 
of the sentence containing the misfitting item would have placed the adverb 
( entirely ) between the subject and the verb, rather than terminally* Possi- 
bly the difficulty of the previous item and the somewhat uncommon sentence 
structure combined to produce difficulty and confusion* 

The calibration of the items in Form l4 would proceed by eliminating 
those students who seem to be causing most of the misfit with the Rasch model 
In most cases, students who perfo™ at the extremes of ability distributions 
cause the greatest misfit. After removing the student vjith extreme ability 
characteristics, the fit of items to the model usually improves* When all 
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©f the iterfls fit the model, the general etruature of the test would no longtr 
be OTSpectt The resulting item difficulties or average passage difficultieB 
would thus be plaoed on an equal-interval scale from low to high difficulty. 
This technique for scaling the paseages without reference to the students 
taking the teat^ providee the basis for placing all of the passages, across 
content domains i on the same difficulty scale* A complete calibration 
desipi network must be developed to guarantee that all items , and thus all 
passages f are calibrated on a common seala. 

Rasch Calibration on a Common Latent Variable 
As previously noted, our major interest in calibrating items is the 

placement of items on a common scale for a latent variable called literal 
comprehension* Speaking in operational terms, "Vtoen the pool of items from 
which we select the elements for a best possible test has been calibrated on 
a latent variable ^ then these items and their locations on the latent variabl 
provide its operational definition* A measurement of a person on the variabl 
will place him among items with difficulties near his estimated ability. The 
meaning of his position on the variable will be defined by these nearby items 
(Wright and Douglas, 1975i P* Therefore, if one is interested in measure 
ing a person's ability to literally comprehend written discourse, one could 
develop and calibrate items on the h^othesi^ed trait, then use the items to 
estimate a person's ability on that latent trait. From a measurement per^ec 
tive, the best estimate of that ability would come from, items calibrated on 
a common, equal-interval, zero-point scale. This type of scale can be devel- 
oped v/ith existing methodologies^^ Csaibration of items or passages on a 
common scale would include the following major ta^sl 

Personal commimication with Benjamin D* Wright and Ronald Mead, 1975. 

^ ^ JL 
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1. Define ooncaptually the variable or trait under inyestigation} 

2. Choose a test or paesage/itern format requiring ^ille 
epeaified in the conceptual definition J 

3. Prepare a passage/item pool I 
km Develop a calibration plan I 
5. Conetruct calibration tests; 

6» Administer the calibration testsi 

7. Collect calibration data and analyse with Wri^t-Mead CiUJ*IT 
computer program | 

8* Select items from item-amalyeis results baeed on disorimina- 
tion and fit^-raean^equare etatiatiosi 

9* Calibrate selected iteme based on difficulty estimates* 

10. Criibrate paswgee baued on average passage difficulty; 

11* Detennina the effect a particular content dofflain mi^t have 
on the difficulty of items; 

12- Assuming item and passage difficultiae are "content free," 
chain or liiA all passage oalibrations onto one ^derlying 
scale with equal intervals and a meaningful zero pointy 

This calibration plan will result in the placement of all clo^e .passages 
in the Test Development Notebook (TDN) on a common scale- Each paseage will 
be assigned a difficulty index that can ^ide teachers in the selection of 
those passages that would be most appropriate for testing their students* 
Tasks 1 through k have been completed- Over 1^000 passageSi^ in multiple- 
choice olo^e format^ are available for administration and calibration. These 
passages cover reading content in the following domains! 

1^ Textual Materials in Beading, LM^age Arts, Social Studies, 
Sci eno e | and Mat hemat i c s ; 

2. Oiti^an Material from riewepapers mnd magazineej 

3* Conswner Materials fTom catalogs, advertising, instructions, 
and so for^i and 

km Sefertnae Material from test instruetione^ children's maga- 
gineSi enoyolopedias, sud so forth - 

£d u 
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In addition, the calibration plan will include the 300 miiltiple-ehoice 
passagee that have been developed for the Teat Developmeiit Notebook (TDN)* 

The calibration plan . Item/passage calibration on a common variable 
involveE carefial planning, complex data management, and extfnsive test analy-- 
sie. The design to be presented will require the construction and atoinistra-- 
tion of more than 400 separate test forms. These forms will be administerad 
to over 50,000 students. In order to place each passage on a common scale, 
a data rnanagemant system must be developed to monitor the estimation and re- 
estimation of the difficulties of 2,500 passages (including some ^redundancies) 
as they appear on different test foms* 

V/ith the Easch basic measurement model, calibrating items or passages 
on a common scale is possible when the same items or passages are placed on 
two separate test forms. It is best when one of the test forme is slightly 
more difficult than the other. This approach links two test forms together 
through common items or passages. For example, define two test forms, A and 
B, each containing 6 multiple-choice close passages with 10 items each. On 
Test A the passages increase in difficulty to the 6th passage* On Test B, 
passage 1 is identical to the 6th passage in Test A, is followed by 5 more 
passages of increasing difficulty. The basic liri^ing model can be ei^ressed 
^aphically as follows 1 



Test Form A 





Fassage 
,A-1 


Passage 
A-2 


Passage 
A-3 


Passage 
A~k 


Passage 
A-5 


Pasmge 
A-6 
(B-1) 


Test Porm B 








Passage 
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Passage 

B-a 


Passage 
B-3 


Passage 
B-if 


Passage 
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Passage 
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In a calibration plan, one sample of studente woiad reload to Taet 
Form A, and a similar sample of students would respond to Test Form B, The 
two testa wotid then be calibrated separately using the Easch model in the 
Wri^t^Mead GMJIT computer program. The resulting calibrations would pro- 
vide two scales with equal measurement units but different origins* The pas-- 
sage difficiiltiae in Test Form B can be placed on the same scale (with the 
same origin) ae Test Form A by adding a tranaLation constant to all of the 
item difficulties on Test Fonn B. The translation oonatant is calculated as 
follows I 

Define the average Basoh difficulty for the items in 
passage A-6 as d.^g and d^^^ for passage B-l, then the 
translation constant from Test Form B to Test A is 
defined as follows! 

The average passage difficulties in Test B can thus be plaeed on the scale 
of Test Form A by adding the translation constant t tg^t to all of the aver- 
age passage difficulties calculated for Teat Form B, That is, dg^^^^j - 

TMs form of elementary linking will place all paSBage difficulties on 
Test Form B on a common scale with Test Form A* However i there is no possi- 
bility of croas--checking the stability of these translations to a common 
scale, at least not within this elementary design. 

A slightly more complex lijaking scheme allows for cross-checking the 
placement of passages on a common scale • The more complex scheme may be 
diagram'^d as follows! 
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In this desipi, paseages A, B, and C ara common to two tests, (This 
basic, trian^ar pattern will form the eore of the final calibration desipi,) 
These common relationships can also be noted ass 
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The translation constante from test to teat are denoted as "t^g^i ^B32' 

and t . Theee translation oonstMts art oaleTd,aLted the same way they were 
013 



for the basic link, that is, t.»^ ^ p p 1 ^n%p ^ ^ v % P * 

AH1 l^^g Igi^^ ig-S 31 



With this basic, trianpilar, cross-cheeking calibration design, an esti- 
mate of the consistency with which passages are placed on a common scale is 
obtained from the ejected value of the B\m of the translation constaiitse 

That is, t^^ + t^,^ * t^^^ ^ 0* . The eTOeeted value of the sum of the trans- 
■ kZi B^d 015 

lation constants is saro. When there is a deviation from aero, an adjustment 
is made to equalise the relative values of the translation constant e, while 
maintaining their s^ at aero^ This triangular unit for estimating the 
translation constants is the basic unit of analysis to be used in the complete 
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calibration desi^ or network. 

Th^ application of the triangular unit ©f anaO^yeis in the final ealibra- 
tion desi^ is illustrated in Fipire 7»7- Tlaim is a small portion of the 
entire final calibration design. Each six--poiatad star is a GOmplete test 
fDrra. (Each test form will be adminietered to approximately 175 etudents*) 
For illustrati¥e purposes, an enlargement of a portion of the network follows s 





Here the triangular unit of analysis, previously diecussad, is marked by 
da^es. The unit of analysis for linking these paesages together would be I 
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Paseages i i and are common to two test forme. The translation 
constants from test to test woiild be calculattd as follows- 

t = d = d 

Theea translation oonstants must also sum to aero. If they do noti minor 
adjustments will be made in the individual trauialation oonstants in order to 
equalise them, (Thousands of calculations must he made to translate every 
passage in the network onto a common scale.) 

The oofflplete oalibration desi^ will involve the oalibration of all 
passages in the Test Development Notebook onto a common scale* The portion 
of the oalibration network illuat rated in Fipire 7*7 refers to actual pas- 
sages from particular content domains. The ''network passage codes-' for the 
particular content domains and sub-domains art as follows s 

Textual Domains Network Passage Code 

1, Reading/Literature L 

2. Lan^age Arts T 
3* Social Studies ■ S 
4. Science I 
3* Mathematics A 

Citizen Domain 

6 , New^apers ' M 

7. Magawnes M 

Cogj^gy^JDogia^ C 

Reference Domain R 
The sample portion of the ori^nal calibration design network in Figure 7 •7 
illustrates the use of the passage codes* It includes three tests that will 
spm: thrfee readability levels, namely, Levels 2, J* and km For convenience 
the example is repeated here- 
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The first test in the upper left-haiid comer of the example is oompoeed of 2 

Reading/Mtera.ture (L) passages—L 2t2: Ji 6^Z^rmA ^ Social Studies (S) 

passagee--B 8*5; S 13,3; B 12,3| S 11,3* The e^caat claaeifieation of 

each passage by identification number, readability level , and duplioation 

(i*e., whether the passage is being used more than ©nee) is denoted as follows 

ad^D. Number) 

- ' - ^- (Riadability Level ) 



(Duplicate) D 

This would indloata passage number 2 at readability level 2 and the fact 
that it is being used as a duplicate to complete tliis test form. 

As mentioned previouslys a data managefflent system will be used to moni* 
tor the manipiAliitions and calc\ilations necessary to place all passages on a 
common scale* This system will include the following major operations i 
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1. Calculatt a complete Raseh ^mlysis on each formj 

2, Average the Rasch diffieiilties for each pasBage 
within each teat formi 

5» Merge identification numbers vitb each passage 
from the compltte calibration design.- or network? 

km Caloulate all pc?ssible translation constants 
indicated by the identification numbersf 

5* Equalir.e the translation conetgata within each 
triangular miit of Malysie such that the impaot 
on adjacent translation constants is minimisedi 
at least within a difficnlty laval or imique con* 
tent area I 

6m Determine the passage with th€ lowest possible 
difficultyi mnd begin to translate passages from 
separate forms onto this ori^nal diffioiU»ty scalei 
continue the translation throughout the calibration 
network totil all passages are calibrated on the 
same equal ^interval scale with a meaningful Eero 



This calibration design assumes that oban^ng from one content area to 
another will not affect the measurement of literal comprehension with the 
multiple-choice clo^e test. Due to the complex interrelation^ips Mong 
passages in the present calibration desi^if it will be possible to test the 
effect of content area Ce,g*, science versus math) on the measurement of 
literal comprehension. There is little possibility that there will be a 
"content effect." If there isf the passages will be recalibrated within 
their respective content domains and organised accordin^y within the Test 
Development Notebook. 

Implementation plan^ The implementation plan for the calibration design 
will parallel the validation plan but will require considerable "front -end" 
work in the areas of plMaiing, computer programming^ computer-managed data 
base and monitoring systems. The complexity of the final calibration desipi 
will be more manageable » pending considerable research and development via an 
esqjerimental implementation of a reduced calibration desi^* 
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The time line for implementing the calibratiQn desipi will be ooordi*- 
nated with the time lint for the validation md produetivity plan* For eon- 
vanience, the overiai reeearoh md development time line is provided again in 
Figi^e 7#8, The three ^raa^dr^'rastarbh eompone^^ calibration 
and productivity) will be implemented sequentially, with overlapping phasee* 

A more detailed time line for the aalibratioa research Gomponent is pre- 
sented in Figure 7*9* This figure siimmariEes^thOTim^ 

of major ta^s, the delivery of products, and the aeaistance of the exteraal 
review panel* The major tasks include i 

1. Planning with input from appropriate consultants of 
the external review panel | 

2. The immediate development and implementation of a 
pilot study of the feasibility of the esqperimental 
calibration desipi network | 

3. A major effort to modify the OAUTT computer program 
to conform its desipi with data management needsi 

The desipi and up-keep of a data management and filing 
system that is mostly ooraputeri^ed| 

5» FinaliEation of the complete calibration desipa network, 
inoluding improvements based on information eynthesiged 
from the pilot study | 

6m The preparation of 450 or more unique test forms based 
on the calibration deeipi networki 

7. Implementation of a major data-coUeotioa effort based 
on the calibration desi^ network with 150 to 200 stu- 
dents responding to each, test form i 

8* Major data raalysis and calibration of individual pas- 
sages including a determination of a possible "content 
effect" and the placement of each passage on a common, 
equal-interval, "sero-point" sealei 

9* The calculation of derived scores that will improve the 
interpretation of test scores at the local level | 

10. Completion of a final report on the calibration of all 
multiple-ohoioe cloze passages in the Test Development 
Notebook, including a delineation of the use of derived 
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SGores to improve the intei^ratability of test soores 
at the local level • 

The schedule for eompletlon of major produate during the implementation 
of the Calibration plan is set forth in Figure 7*9* 

As noted| the calibration plan will be directed from the Bure^ of . = 
School and Cultural Research and coor^nated with the validation plan for 
the multiple-choice close test. 
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CHAPTER VIII 



M^YSIS OF CLOZE PASSAGES AND TTMS 

This chapter daecribas thraa related but distinct phaiei of an 
analysis of the multlpXa-choiGe cloge ©xerelsei atolntstered during May and 
June 1975 to approKdmately 5,000 itudenti in an urban school dlstriet In 
upstate New YoJffc* As indioatad In Ch^ter Vp the olo^e exaraisei consisted 
of three levels of test fomsi Level aininistered to students in grades 
If 2, and 3| Level II ^ aialni stared to students in grades 4^ 5^ and 6| 
and Level III, aialnlitered to studants in grades 7, 8, and 9» At each 
level there ware 12 separate test foms^ for a total of 36 test foms. All 
passages on the 36 test foma ware taken from the raading/literatura domain* 
Each test fottn contained six passages andj with exceptions at Level I^ each 
passage was accoi^anied by 10 inultlple^choica items* GeneraXlyp thenp each 
test form featured 60 raultiple^sohoice itms* Since the majority of studants 
taking the Laval I tests were first and second grader^|» the Leval I test 
foms each contained three shorter pasiages with fewer items Cl«e«9 either 
three or five Itons). Thus, the Level I test foms vary from the general 
pattern in that they contain either 39 or 41 itOTs, Instead of 60, 

Passages were randomly salacted in order to assure that all test foma 
at a given test level Cl«e., Levels I, II, or III) would be as nearly 
aquivalant as poisible* The passages on every test fom were arranged in 
ascending order of readability Cl,e«, from the least to the most difficult). 
Each test fom at a given level, then, represented the sme range of reading 
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diffleulty.- 
Utility of Analysis 

The 36 cloge test foms and the data derived fraoa their adainistration 
are the materials upon ^Ich the analysis reported in this chapter is based© 
Since the readability lewis of all clo^e passages were detemined by the 
Spache and Dale-Ghall readability foniiulas, and since the fomats and ItMS 
were governed by the sane constructional procedures, any sanpling of cloze 
Tnateriali should be representative of the entire coiqpus# 

The analysis here reported represents the first ^stanatlc inspection 
of the clo^e materials* It affords the first opportunity of deteminlng how 
successfully the cloze materials confom to esqpectations* The analysis had 
two basic purposes; One was to detemlne the need for modification of the 
cloze format and procedures to assure maxtoisn conslstenGy and objective 
reproducibility of passages and itensi the other was to study the effec-* 
tiveness of the Spache and Dale*-Chall readability fomulas in ranking 
passages by difficulty in order to provide preliminary guidelines for the 
selection of passages for test assmbly* 
Phases of Analysis 

The first phase of the analysis was a critical exanination of the 
cloze test foms largely completed prior to availability of test result i 
data but Informed where necessary by the data as it becane available* It 
was Intended to discover and categorize ostensible flaws in passages and 
Items ^Ich mi^t either represent inconsistent application of cloze pro- 
cedures or predict deviancles in the behavior of passages and items on tests. 
The second phase of the analysis exralned the effectiveness of readability 
formulas in rar^ing passages by difficulty. This involved the calculation 
(using Rasch analysis data) of passage easiness scores by grade and across 

239 

8-2 



gradee^ for each passage at eaeh test level, and the in^eatton of these 
scoMS both by test £om and by designated readability level* The third 
phase of the analysis featured an infection of items on 12 test foms 
(four at each testli^ level) ataied at identifying a^ es^laining deviant 
Itans in order to datemlne the need for further revisions of cloge pro- 
eeduresp to suggest modlf laations ^Ich mi^t improve alose material s, 
and to make preliminary observations on the unldinenslonality of the clo^e 
materials*''' Thou^ these three analytleal phases are iubstantially 
Interdependent and mutually InEomatlvej they will be described separately 
for the sake of iiinplif leation and elarlty* 

Phase l"-Crltioal EKminatlon of Cloze Paisa^es and Itms 

This phase of the analysis wag Initiated as a review of the 36 clo^e 

test foms for the sake of assuring Gonslstency aaong the items* Two 

steps were Involved. The first step was the aetual review of every lt«n 

on every test fom# This step resulted In the identlf iQatlon of a nianber 

of ItOTS which were flawed either in terms of violations of eKistlng clos^e 

rules or procedures or In tenns of desirable revlslsrns in the elo^e pro" 

cedures which came to light during the review procesi* The seoond step 

in this phase of the analysis Involved detemlning the extent to which 

flawed Itaas were able to predict ItMi subsequently revealed as deviant 

2 

by the test result data* 

iThese ^bs erva t lo ns invo 1 ved Infection of misfitting Itaas. It^s 
with high fit mean scpaarei on the Rasch measuroaent model are Identified 
as misfitting or as measuring characteristics other than those \diich the 
itOTS were intended to measure* Misfitting Itana on the cloie tests are 
assumed to be meaairing SOTie trait other than or in addition to literal 
amp rehens lo n# 

criteria for Identification of deviant ItOTS are presented In 

Phase 3* 
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itsp I'^i-Review Procedures and Rasults 

DlatrMtori for mvmTy lc«n on every test £om were Inspeated in 
confcesrt by at least tw and occasionally three reviewers* Each reviewer 
wuld independently examine every pasiage on a test fom^ for eve^ itm 
testing each diitractor In the space left in the passage by the deleted 
wordj and noting itasie ^ich semed flawed. Then at least tTO reviewers 
would eKwlne the test fom together, diaeui sing ItOTs with possible flaws 
and listing In categories all itans thus identified as flawed# 

The four categoriss of flawed items identified by this review include 
gramatically (syntactically) implausible dlitractors, sanaantically plausible 
distraotors. Idioms, and errors. 

Granroatically (syntactically) Jmplausible distractors^ Clo^e pro* 
cedursa require that dlstractors be taken from part-of "Speech lists 
Identical to the part of ^eech (i«eap function in conteKt) of the deleted 
Word* Many of the flawed itans in this catego^ resulted from the use of 
an Inappropriate part of speech list. Others had dlstractors disagreeing 
with deleted verbs in tense or nmaber or distractors disagreeing with 
datarminers (a, an, the) preceding deleted wrdse But the most frequent 
problOT was unworkable distractors tdcen from the appropriate p art *of ^speech 
list* that is, every word which can function as the same part of speech as 
a deleted word cannot necessarily function in the same contextual position 
as that deleted wrd* For escanple, if the word ^tow, a noun In the sentence 
"John went to town last weekend^" Is deleted, mt eve^ possible noun will 
be gramatically plausible in the position vacated by town . Consider such 
nouni as house or cow* In this content they are granamatlcally faiplaueible* 

Smantlcally plauiible distractors* Olo^e procedures do not pemit 
the use of synonyms as dlstractors for deleted wordse Some synonSTms were 
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found motig the distractoffs (e«g*9 swift, quiefc). But the majority of 
theia kinds of flawed it&nB involved dlstraGtori lAlch, though :not synjonym^ 
for deleted wrdS| wace nonethalees contextually plausible» In the 
sentence *T:he children were singing loudly , ^' such distractors as happily 
or Tnargily would be plauiiblej though not ^noir^oua, sabstitutei for 
loudly* 

IdlOTtB# Defined for cloze procedures as any word for ^rfiich no 
gracmatically plausible and stoultaneouily senantically iiiiplauBlble dlsi* 
tractors can be founi^ an idiom is illustrated by .the fQllowingi "It is any 
attonpt at written aOTnanication*" Any gramatlcally plausible substitute 
for aig (tig*, somef onef every) would also be i^antiGally plausible* 

ErTOrsg This category of flawed itOTS primarily invoived such thirds 
as spelling or typographical errors in passages or distractors* 

Flawed it^s aaounted to approximately 25% of the total number of 
itons on the 36 test foms* They ware distributed mong the four categories 
as followss grfflaraatically toplausible distractorsi 58%| SOTiantically 
plausible dlstractors, 19%| Idioms, 19%| errors 4%# 

Three other t^es of flaws in the cloze materials were noted during 
the course of the critical review* These involved titles, passage 
coherenoef and violations of cloze rules regarding the nianber of words 
between deletions* Titles presented tm> different kinds of probl^s* The 
first is a title which is inappropriate Ci#e*9 either misleading or 
unrelated) to the passage which it precedes. The second Involves a title 
which cues one or more of the Itans (l#e*p contains one or more of the words 
deleted frOT the passage)* Passage Incoherence Is prDdiiced by ai^ violent 
dilft in direction or cha^e in topiC| santenees not clearly related to 
or following from one another result In Incoherence^ The concept of 
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literal comprehenilon upon ^ich the cloge test is based assianes that 
apprehension of the meaning carried by a TOrd deleted f a paisage 
depends on a certain percentage of intact surrounding context. Thus, 
stipulations regarding n«nber of wordi between deletioni were part of the 
close rules or procedureSg Tttian the pattern of deletions ih cloge 
pasaages did not adhere to these stlpulatlonsi such violations wre noted. 
Though problOTS with titleSi pasiage coherence, and deletion patterns were 
observed and notedp the frequency of their occurrence was not tabulated, 

Ai a result of the first step in the critical review phase of the 
analysis, two courses of action were inltlateds revision of the close 
rules and review of all cloae materials* 

The flaws ^ich the first step in the critical review discovered In 
the close test foms revealed some apparent inconsistencies in the appllca* 
tion of the cloie procedure* Such inconsistenclea, clearly^ attributable to 
the expertaental and evolutionary nature of the development of the close 
material Sj reflected several problems ^Ich rCTained unresolved at the 
completion o£ the cloae corpusi to wit, the close rules were marred by 
excessive caspleKlty, numerous eKceptlonSp and insufficient precision* 

Review of the test foms having pointed up the necessity for revision 
of the close rules and having identified problOT aspects of the close 
materials, revision of the extant close rules was begun* After thorough 
and Intensive discussion of the intended nature of the close materials and 
of procedures essential to the achlevment or production of such material s, 
a carefully revised series of cloze rules Ci»e*, "Rules for Application of 
the Cloze Frocedure," \^pendix A) was fomulated* 

The revised cloze imles are extrOTely toportant in several respects. 
The currant rules can be applied with greater assurance of uniformity. In 
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other words, because the current rules are wuch raore efflclantly and 
practically appllcablej they assure the production of passages and itass 
with the highest degree o£ objective reproducibility* Furtheri the current 
rules represent an Intemediate step crucial to the ultimate development 
of an algoritto which will pamlt the laaKtom degree of computerization 
of the clo^e prooedura and, thuSp the maKimTO speed, af f Iclencyp and 
practicality In the production of close ifiatariali. Finally, the revised 
cloze rules have facilitated the ijHplOTentatlon of a review of the entire 
corpus of olo^e materials, a review which will greatly erdiance the con- 
sistency and, thus, the practical utility of the clo^e materials. 

The review of the cloze coitus now underway is a systCTiatlc and 
thorough one which endeavors to consider eve^ aspect of the cloza passages 
and items relevant to quality and consistency. Since this review is Intended 
to assure unKom application of the clo^e rules, it Should effectuate the 
greatest possible degree of atandardlasation within the cloge passage fomat 
and the items^ Moreover, since the review is to correct all flawed IteBS 
and replace those few passages found to be unacceptable in the light of the 
revised clo^e rules, the corpus of cloae materials ronaining at the con- 
clusion of the review prOTisas to achieve as nearly as possible that 
unldtaensionality in a testing device which Is such a critical need in the 
ineasurment of literal coraprehension* 

Step 2*"Flawed Items as PredlGtors of Statistical Deviance 

The first step In the critical escatalnatlon phase of the analysis of 
the 36 close test foms Identified a number of flaws* The following ^estion 
arose in response to the identification of these flaws? To what extent do 
the Idantiflad flaws anticipate, predict, or e3^lain statistically deviant 
items? The reviewers- assi«aptions were that flaws wuld produce predictable 
statistical deviance. 

Specific predict ionS g It was e^^ected that grOTttatlcally inplauslble 
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diatraeEori and idioms i^rouXd produce unuiually easy ItCTi, because the 
dlstraetors ^uld be bq obviously uncOTpetitive, It was aSitjaed that 
iaaantlcaily plausible dlstractors would piroduce unusually difficult items 
becauia of the competing diitractorC s)» ir»pvs were expected to confuse 
studenti and thua create difficult ltms» Titles cueing Itoas wew e^qpected 
to produce giveaways or easy Itaaa, while passage Incoherence and violations 
In deletion patterns were esqpected to mskm items difficult* 

Before these predictions about the relationships between Identified 
flaws and statistically deviant ttemm could be tested^ certain calculations 
baaed on statistical data provided by the Eaach measurement model analysis 
had to be perfomed* Since these calculations and coimenta^ thereon 
constitute the second phase of the total analysis of the cloise test forms^ 
the full discussion of the procedures and Replications related to these 
calculations will be withheld at this tima in favor of a very brief sinmary» 
^estlons arising in response to the following suimary are referred to later 
sections of this chapter^ Phase 2 and Phase 3» 

The Rasch measurement model involves jaany statistical analysesi among 
them the analysis of the easiness of ItoaB (l«e«^ the proportion of students 
correctly responding to an item)* Thus^ if lOO students respond to an 
Item with 75 answering correctly, the easlneiS of the item is .TS* Since 
the Rasch model provided easiness data on mv^icy Item on every passage on 
all 36 test foms, averaging the easiness of the items on a passage would 
give ^at was tamed the passage easiness. A deviant item (for the purposes 
of this phase of the clo^e analysis) was one tghoie easiness varied by a 
given mount friM the passage easiness* 

The Rasch model also provides reiponia frequency data on every test 
ItCT* That is, the Raich model shows^ given an Itea with five alternatives 
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(i.e. I the correct answer plus four distractors)p how many tmes each of 
the alternatives was selected. 

The predictive acctiracy of the assmptiOTis regarding the Identified 
flaws -was tested in tems both of itra devlanGe atid of distracCor responBi 
frequency* That is. If an itm had been tdentlfled as flavred, both its 
easiness acore and the distribution of responses to its distraGtors were 
eKanined, Thus, It is conceivable that while an it an flawd by a graimat- 
ically toplausible distractor might not have a deviant easiness scores the 
flawed distractor niight be dysfunctional or uncoiapetitive Cl#e», a 
dysfunctional distractor is one which is not selected)* 

Observations ^ The testing of flawed itms against item devlancy and 
response frequency data on a smple of 12 test foms Cfoui from each of 
the three testing levels) produced the following observations* Generally 
ip eaklngi the cat ago ri^ at io ns of fl awed It sns pro ved to hive^ lit tie 
accuracy as predictors of Itan performance* Specif Ically^ jrannat ically 
implausible dlstractors had almost no observable bearing on item deviance* 
Idioms^ sonantically plausible dlstractors, errors, and title Gueing had 
some relationship to itern deviance, but that relationship wa# relatively 
slight and often contraiy to e^ectations* (Effects of passage Incolierenca 
and deletion pattern violations wre not systenatlcally observed at this 
tame, but will be discussed during the description of Phase 3 of the total 
analysis*) 

Tliera was very little observable relationship beti^een grMnatlcally 
taplausible dlstractors and iteni easiness scores* In other words, gram-* 
mat ically Implausible dlstractors seamed to function no dlfferentlyi no 
less adequately, than other dlstractors* The only escceptloxis involved 
a very few itans in which dlstractors identified as grranatlcally Implauilble 

246 

8-9 

O 

ERIC 



proved to be dyBfunctional* These, would IncLude only those 'itatis which "wera 
dlfflCLtiLt enough to produGt a relatl^jaly high proportion of distractor 
seLectlonSi Given iiich an ttenp the ofia granmatically Jmplauilble dis-* 
traator (i%B9^ out of fourj usually) would have rec a Ived virtually no 
atcentton* Put another gramnatlcally inplauslble distractors, on 

Itsns prOTnptitig much guesslrigj wuld be the dlstractors which even the 
guesseis would refuse to select* Agalrii It must be stressed Chat the 
frequaiiay of such Itaas tob very slight, Grffipniatlcally unplauslhLe 
distractoEi, then, have little utility as predictors of Itan deviancy* 

ScmantlAally plaiisiblc disfcractors, IdlOTSs and errors were some^at 
moire ufiaful as predictors of iten dev-iancej they were often able to idemtify 
difficult itms« To eLaborate, a hl^ pTOportlon of distractors Identified 
as semantic ally plauslbla proved, in fact, to be highly competitive with 
the~correct answers. " rh~ls"^as "as" i^^ect^^ 

as idioas occasionally idetitlfled deviance is not une^ectad, but it is 
sora^ewhat sa^rtsing that Idioms Ideritlfled generally difficult ItOTS^ 
rather than easy ones. IChough the es^ectatlon was that idioms wuld make 
easy Itans, it transpired that, Judging £tcm the distribution of responsas 
Co distractors on such itms^ they apparently promoted much guessing* 
Idi(»ns, then, did not behave exactly as predlotedp but they did Identify 
davlanQy. Errors similarly produced difficult ltamS| itans involving 
more tlian usual degrees of guesslr^. This again is what one would eKpect| 
fortunately^ there weia vairy fe^ itans flawed by errors* 

It vas aLso observed that some titles which ostensibly cued itms 
did Indeed make such Items generally less difficult* However^ no pattern 
was observed in the effects on ttans or passages of titles which ^/ere 
misleadliig or irrelevant* 
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Though the categDries oE itmtL flaws davaloped in the first step of 
thti phase of the cloge test fom anaLysls proved to have ^nly slight 
utility In predicting itm deviaiiee^ the experience of ciitically 
exminlng the close paisagei amd compi-Llng the categories %m been vei^ 
useful several other ways* Tie critical examinatlo^i of the test fomi 
conitmed the need for a revlsloTi o£ the clo^e rulesi suck a re-^^ision has 
ocautred, and its utility has teen discus sed. The etitical e3<OTinatlon 
also poiTited up the "need for a systCTiatlG review o£ th€ entire close 
Corpus^ aad this ravlev, now umderwayp premises signifdcamt laproVOTent of 
the cloge materials* Flawed Itans involving idioms, eTrorSy and ssaan- 
tiaaLly plausible distractors liava led both to the clMlflcatlon and 
simpLif ication of the cloze miles and to the toprovattemt of esistirig close 
materials* Flaws involving gramnatlcaHy implausible distractorei tliougk 
identifying scaiit It^ deviamcji have also resulted in .remsLons of clos:© 
ruLas and led to a ravie^^ of tlie entire cloze co^us ^Aich will improve 
face validity of the materials* Farther^ the InGonslstent effects of 
titles (combiried with the difficulties dmpLicit in atter^ting to control 
tltLe-*writing5 have led to the decision to eltalnate tie title requlrCTent 
in future applications of the cloze pirocedure*, the item fla^s have also 
made it clear that the most efficient way to identify lt©n deviance is by 
inspection of the data resulting from adninistration of the test fcms (see 
Phase 3), But the e55>erlemc€ of e^anlnlag test foKHs critically has 
aupiented the acuity and sensitivity of the inspectlQn of the test result 
data, €Specially by anticipating es^lanattons o£ several typical kinds oE 
itan deviance* 

Phasa Z'^'^Analysls of Readability Fomiul a Utility 
The secoTid phase of tfcc analysis of the close test fottiis Involved the 
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coffiputatton aad cstegoTlzatloii off passage easiness data Som ev^ry passage 
on all 36 Cast £ocms« The first py^o-se of this analysis was to detemine 
whether the tesc aiimhly pMCedii^e j Kad prodiiGed teit fotms with certain 
SMillar CUe«| as nearly e^uivslent as possible) characteristics. Those 
characteristics ^sre tlia cange of passage difficulty per test iom and the 
incafmantal patcern of dififlculty a^oitg the passages on each t^st fom at 
each testiiig IweL Levels 1^ IX, and III). The second purpose of 

this analysis 'c^as to detacmlne tk^ ^cc^ucacy and utility of the Spache and 
Dale-ChalL readability fomulas as dniicafcois of the relative dlffiGUlty 
of passages Eqt sttidents at given. gOTdg Levels* This IrfomatloTi i/as 
desired as a pieltaiiria^ basis fo^ guiding potential users of the cloze 
materials in. ths seLectioti of passages^ 

The tast ass^bly procedures t^#re intended to produce 12 test foms 
at each of IKree testixig Lsvalsi mibh ioim featuring passages of the same 
range of difficulty arMi^ed In ascending order of dlgflc\iityi If test 
assanbly procedures we^a succesiftal^ then easiness data (pfovided hy the 
Rasch model) would iho^T that the overall difficulty of a giTran test form 
did not v^jfy dranatically from tha pvaMlL difficulty of the other test 
fosns at the same testing leveLf that passages on all test forms ware 
arranged bo that each succeeding pa^saga ^as more difficult than its 
predecasscrt 

The Basch measuirment model p^i^videi data on the aasiaess of eve^ 
Item on a given test fom, arrangad both vrithln grade and across grades. 
That 1 Bp at a glveri test level, say^ Level I, the Rasch inQd el provides 
easiness data for gradai 2, and S, and for grades 1 to 3 inclusive* 
For all 36 close test foms, paasag^ aaslness for each paiiage (l.e»^ the 
percentage q£ c-orrect ffssponsas to 4tm, averaged by p^iMga) ^as 
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cottp^ted by grade and acrois grades, Table 8*6 In i^pendix B contains 
passage easiness data for each test fomi referenced to Test Develo^ent 
Noteljook (TDN) tdenttf icatlon numbers assigned to each passage. 

The passaga easineis data cortfim that the deal:red patterti of passage 
difficulty? on the test fofflas has genaralXy beeri attalaed. For the most part 
each succeeding passage on a given test form has a lotrer easiness than its 
predecesioCo This pattarn Is coailstemt both , across grades and within 
grades* The variations frcm this pattarti are geaerall^ slight. Such 
v^a^latians, furthaffinorei aie largely explainable by the esqpected overlap 
o£ readability scores* That Is^ the Dale**Chall readability fomula produces 
a score ^ich Is then converted to a grade range.^ For axaaple, a Dale-Chall 
score between 6»0 and 6«9 wvld^ in aDiiventional applicatloTiSj indicate 
material for grades 7*»8# For cloze purposeSi however, such a score range 
^as conwrtad Into four distinct readability levels Cl«e.^ 13j 14j IS, liOs 
It Is TOt su^ rising or unustial^ thenj that the readability scores on the 
cloze passages did not always predict: actual passage dlfficylty with 
absolute accuracy* 

Kvch of the □verall slight variation from the desired pattern of 
Incraaatng passage difficulty within a test fomi then^ "was anticipated* 
But socna of the variation Is attributable to the accidental Ju^apoiltlon 
of somewhat devlarit passages- For eKOTplej occasionally the iecond passage 
on a test form may be soiaet^at more dififlctilt (on the basis of student 
perfomiaiioe) than most of the passages t^en f tot its readability pool. 
Gii^en that sltuatton, and then given that the third passage on the test 

fom may be an escmple of the opposite phemOTenon (1. e»| the passage 

! 

selected inay be somewhat easier than the other passages in Its readability 
pool>| Bximh a Juxtaposition of pasg5go.M not surprlslTigiy results in a 
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ya^iation froa the desired patteen (I.e., the third passage on a test foan 
ii supposed to be more difficalt than the second)* 

Generally spe^ingi the Hp the desired patterTi of ascent in diffiGulty 
of the passages on each test Eom ^as attained. Thus, students taking the 
teste were confrDnted with sinllar taskse If this had xiot been the case, 
Ifp say 3 some students had teat £oms with passages atrariged In descend lug 
Drder of dlffioultyp then the tasks corfrontlng the students^ as well as the 
test- taking conditions or clrcimstances^ would have varied greatlyp thus 
iTBSulfcing in widely differing aruclaty levels mong students and renderlrig 
questinnable any illations derived from comparisons o£ test re suit data on 
differatit Cormse 

The basic similarity in difficulty of the test foms at each testing 

3 

level Is further verified by a cOTnparison of the meaTi easiness across 



grades on eaeh test fora at each testing level using the date in Table 8*1* 

Table 8.1 

— — ^ . Mean Paggage Easi ness Acroas GradgB by- Level 

LeveTTl Level II Level III 

Jom 5i Easiness 



1 58*17 

2 56*50 

3 60*83 

4 60.67 

5 61,50 

6 59.50 

7 59.83 

8 56*83 

9 60*17 
10 58*50 
U 61*00 
12 60.00 
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63.50 


24 


64.83 


36 


69.50 


1 ranges 


for Levels I, 11, 


and 111 


; are 56.50- 


'-7L.83, 


reapectively. 







%ieaii easiness pro vide s an index to test fomi difficultjr i^hich Is 
equivalent to the mean score* Mean easiness is the average percentage of 
aerrect re^onses, vrtille tnaan score is the average niaiibe r of correct 
responses* 
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As Table 8«1 aho^^s, the variation In xneaxi eagin^Si ieor©s (l»e#, the 
average passage eailness of the six passages on a test formD per test form 
per level is slight* A^gain, grossly different menm eaitness scores TOuld 
have Euggested sorne basic iMompafabtllty mong th© test fomSf tut the 
scores reveal no gross dlf f erenceSf Indeed, the? f act that tiie range of 
mean easiness scores on the test foms at Level X ii only 5*00 (ItSi^, loW| 
56«50| highp SltSO) suggests near equlvalenGe among the test fomis at 
that level* Such a narrow range of scores is as eKpeGted, since the 
passages for each test form were derlYed from a single domain aiid i^ere 
sy stanatleally sal acted to include tha sane range 'Qf i^eadeblLity* 

As escpected^ the tanga of mean easiness scQrei on the test fornis at 
both Level II and level III (I.e., 11*00 and 10»66^ jrespectlvelyD is broader 
than the range at Level I* These broader ranges intuitiLvely rea^Diiablef 

because one would ejqieot to see^ among students in grades 4 trough 9^ i^ider 
variatlofts in literal Gomprehenslone 

Thu^^ both tha conformity to eKpectations of tfche pattexn of ascending 
passage difficulty on the test foms and the abiei^e of any gross variations 
in mean easiness scores per test fom per level sewi to suggest that the test 
assembly procedures have succeeded in achieving Che 4estred products test 
foms similarly graduated and slroilar In diffictilty^ These stiriaca eJmllarl^ 
ties among the test foms lend credence to the proposition that they^ are 
equivalent measures of the sme readliig-related ability^-Literal comprehension* 

Easiness averages for each passage on all 36 teit forms are categortzed 
(Table 8*1 in AppaTidiK B) by passage readability lavels both within grade 
and across grades. The easiness of Bvmty passage on every test fom at a 
given teiCing level is cOTipared to th^ easiness of all the other pasiages 
at the same readability level (detemdned by the Spache and Dale-Ghall 
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fomulas). Thus, Identification of passages \diiGh vary from esrpeGted 
perfomanae is easy and convenient. 

Just as th^ mean aaslness of test foms at a given level spanned only 
a narrow' tangep so Is the easiness range of passages at given raadability 
levels iimilatly mTrowt Specifically, at level I the range of easiness of 
passages at a given readability level Ci#e*j identical to the readability 
level pools from t^hich the passages were selected during test assembly) 
seldoTO a3caeeds mil (e«g«, the 12 pasiages fiom the pool representing 
readability levels 5 and 6 range in easiness from •43 to •59). Thus, any 
passage at Level I ivhich bTOadens the easiness range at a given readability 
level beyond tlT imtxti be considered deviant. The danarcatlon by ^ich 
passages at Levels II and 111 were identified as deviant was based on an 
easiness range of • 21. Given these ranges as touchstones for identifying 
deviant passage of 216 total passages on the 36 test foms, only 11 were 
deviant (5%)^ 1 at Level 1^ 4 at Level Hp and 6 at Level III, (An 
explanation o£ deviant passages i& contaiiiBd in the discussion of the 
third phase of the analysis*) 

A mora ccandensed schematiaatlon of the easiness of passages by readabll 
ity levels Is featured in Table 8* 2» As this table illustrates^ as the read 
ability levels of passages increasei their easiness scores decrease. This 
pat tern is gane:5ally maintained both within and across grades. An obvious 
and escpected variation is that mong students In higher grades the ringe of 
scores on passages at given readability leTOLs is generally narrower and the 
upper extreiaity of scores is generally hl^er than among students at lower 
grades. An unexpected variation is that at Level 111 the upper eKtrOTlty 
of Boores on passages at readability levels 21 and 22 is hl^er among grade 
8 than anong grade 9 students, A possible e^cplanatlon is that testing 
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t£m& fo* ninth-graders was limited beaause of variations lii length: of 
class periods. 

Given that the readability formuias used to Gatagorls€ paisages 
readability levels vmtm ^ployed in the full a^arenesi and es^eetation that 
such fomulaa are only useful for making relatively coarse discrtolnatlDns 
anong passagesp the easiness of passages within readability levels attests 
to a ranaric^le degree of aacura&y In the deteE^inatton of passage difficulty 
indepanclent of tast result data« Thus, the practical goal o£ the eitablish- 
ment of preliminary guidelines for the selec-cloti of passages by given 
readability levels has been met. 'irabLe 8»7 in AppeTidisc B cotiid serve as 
a guide to users of eKtanC elo^e laaterlals in the selectloii ^£ passages for 
the purpose of aisembllng test foraas related to given readability levels* 

The Raich measurOTetit model is a statiitical technique for the 
calibration of passages by difficulty. It is a muoh finer aiid more seniltlve 
method of calibration than readability formulas have leen able to achieve^ 
thus pemlttlng an eKtrenely accurate calihratlon of the cl^ge materials* 
IraplOTentation of the Raich modal will greatly increase the utility of the 
cloze materials for each of their testing pu^oses (l^e*^ siiwey testings 
aehle venae nt tnonltorlngp and diagnostic testing)* 

Phaje^«^Analy sie of Itmt Da viatic e 

The pu^ose of this phaie of the analysis is fourfold* First, the 
anmlysis identifies the deviant items in a amp ling of 12 of the 36 cloge 
test foma (4 at each testing level), discusses facto^^e eont^ribatlM; to 
devlanGSp and draws tentative concluelons abouc th« top ligations of ItM 
devianee for the continued development of the alose materta.l^*. Secondly, 
the analysis discusses the U pas sagas mentioned In Phasfr 2 as deviant , 
examines factors contributing to passage deviance, av?d GUggests possible 
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coursai of acttdii In re^onse to passage de'viancee Thirdly, the aaalysis 
makes observations and drawi tentative GotiGluslotis about the relationiiip 
betw^een deylanGe and Itan types Cl»e*i nouns, '^erbSp adjeatlves, adverbs}* 
Flmllyp the analyeis dlseusses the pu^oses and utility of the fit mean 
square both as an aid to passage catlbratiom and as a method for identlfyiiig 
deviant itenis« 
Deviant Itgms 

Phase 1 of the analysis of the 36 sla^e tast foms had revealed th© 
Impraatlealltyp as a method of Idantlf^lng deviant items, of attOTptiiig to 
predict devlanaa based on flatus noted In the test foms prior to the eKamitia- 
tion of test reiult data* the ea^erle^&e of performing the Phase 1 analysis 
also suggested that an imspeotlon of response frequency data, pcior to or 
Independs^t of conilderatlon of easiness data, T/as stailarly inefficient* 
Thus^ ascperlence sho-wed that the most praetleal tnathod of identifying 
deviant items was to eKOTiine easiness data* 

Aji Itm was identified as deviant ^en its easiness score diffajrad by 
a given anount from the avecage easiness of the ItCTis on the passage in which 
It occtirred« Thus^ to identify iteia deviance, ^soh easineis data ^ere 
Inspected atid the easiness of eae3h Itenn cn a test fom was compared to the 
jrelivant passage easiness (Gomputafcton of passage easiness was dsscrlted in 
the disausslon of Phase 2 of the analysis) # Itwis on Level I test foms rare 
Identifted as deviant when their easiness scores differed by +*12 from the 
relevant passage easiness, A difference of 4*13 at Level II and -hl8 at 
Level ill identified deviant Itons at tKo'ss levels. These indites fo^ the 
identification of deviant Itons varied level because the range of ItOT 
easiness scores per passage ganeraHy increased as the readability levels 
of the passages and the grade of the students increased, 
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A£t&t inspecti-lon o£ easimis data IdanCif ied the divlant Itena on a 
given test fom^ each devlamt Item waB eacmined tho»iighly im. cffder to detei* 
mine, li possibla^ ^at factors Ce,g«i aspects of dlitcaators, fomat, or 
passage) ml^t have contributed to the de'^lan€a« If it were pcsslble to 
gatierair a abovt aharacteristlas of de-vlaat ItMip elves ml^t be dlseoireced 
vhich would enhanca the pEOgresitve toprov^ameiit and refflnOTierit of tlie clo^e 
materials* Thtis, each deviant itim ™i tnspacted T^rltMn the aontext of its 
paisage and in terns of the pecfomatic* of its dlstractorsB In other TOrdSj 
f»r Bmh deviant itanj both content and raspfttise fw^eacy data were 
escaniii&d^ 

The results of this flist step Itl the i^^ulyBlu af devlamt Iteas are 
preseiited in Table 8.8 In ^pendlK B* This cat ego rises results based 

on analysis of 12 test fiome, four fTcm iach tesfcli^ Isval. Some of tlm 
test forms were randontly selacCed for maLyiiSp and SOTae were salocted 
hecausa the Phase 2 analysis had revealed tliat th^y contained defiant 
paisagai» Generally, the test foms SLjialymi are a fairly represen.tativ^e 
sample* 

"Table S«8 In 4ppendiK 3 is orgami^ed by testing level. Deviant items 
on the four foms analysed at each le^rel are identiftea by TBM paisage 
Identif iaatlon niraberse Ne:Kt, the table presents tha fiDllowlng data and 
Infomatiori: the easiness ef Che passage^ the manber' of the Itemj the 
easiness of the itmp the f it mean square of the It^j and the part of 
speech of Che itrai* Fit me an squares are pTOvldad only ^en Chay are high 
(i*e, , above 3,705, ThQUgh they will not be dlscussad mtll Che last step 
in Phase 3 of the analysis, the fit mean sqtiares are p£ovlded here for 
the s^e of future reference* Sdmllarlyi parts of speech <code nimbers--lj 
2^3^4-.-refer to nouns, verbs, adjectives, and aivmvhm^ re^acCl*jely) and 
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Item devi«rice will be dlsa^esed # late^ tlme« 

Finally^ ITabla S.S la Appendijc B offats Iftte^retaticas o£ faeftors 
isAiah sem to e^qplatn or Gontrltute to itan de'riaTice. These tntetpretations 
are based on an analysis which coxiiideTOd botb the nature of the passage 
CQTitalning the dayiant itenCsJp axid the lehavlo^ of the dlstcwtors* 
Thus, the tntiCp^ii^atioiia refer bct^ Co conteKt and to dlstractpri. Slnee 
the inteip fit sC Ions of fattora cgi^ttflbiiting to Item davlaiiot acsp for the 
sake o£ B^onomyt deliberately tersg^ eael interpretive vaTtatlQn wdll now 
be es^laliied^ and ilLustrntad whe»e neQessary. 

CgcaMi - ^i^^^^gf-^^^ goBii^. ^jo^rdE^ eharaateristi^ 1 Ly oceus with^ - 
great fra^aenoy in comblmtion -with eeMain other TOrds- tflian words 
illustratlire o£ micii faniiiarly saer3 combinations are closed, they typically 
result In unusually easy itms* Th^^, in the following phrases, when the 
utiderlltiecj wrd is cLoBsd, the reauLtiiig ttaa will characteTiBtzically be 
easy for students at relevant gMd© la^elsi for the Eimn tiMm ^ ran all 
the way hmi% fcom ^choQlg brushad teeth. 

Gramiat ioall ^ ( syntaotioallyD fmplatisible distraGtor , DeCined in the 
dlscussioti of Phaea 1 of tJm analysti^ rnxdi dlstraotorsp thcu^ Infrequently 
CDntributing t<^ It^ deviaTiee^ do offca^icnally help to explain easy items* 
The fQHotrf.ng ts an ttSEn wjiose devi^nee Is partially explainad by the 
gra^iatlo^L Mplausibillt^ of its df-stiactorse '^ut eve^bcdy 
you can buy # # **^| Ca) mrirlep' (b) Ereeaas^ C^^ pionaesSj (d) knoyS j 
Ce) gains*, 

Sana'ftt iG aLly plausible disttac cor* hlmo defined in Shase 1, dlstractors 
which can be gubstitutad for deleted- vrofds t^ithQut produelng confusion or 
tneaninglessiiess are o^^H^oub es^latitt Ions fcr some diff Irailfc ita:niS4 Inters 
estingly, mmtty of the deviant ICMr^^^lch ^re e^^licable In tenas of 

2 63 

8^23 



SCTtantieaLly plausible dlstractorg mmx to ffeveal that for many test takerss 
a distifactor %±iich Is not semafitlQally platisible In temi of the entire 
passage context %jilt appeac plauilblei and thus function oOTipetltively, 
In a narxower ot mocs resteletedl ge^snt of context* Thus, the following 
Item illtiBtiates a dlstiactoT ^^iehj thou^ not plausible In teBni of the 
passage In Its enticstyp inl^t Indeed appear plausllLe If context were 
restricted to a single sentgTOes "1 wish that you had bought us a 
instead of another c;c^w"| (a) star^ (b) tower, (c) clock , (d) ilab^ (e) colt* 
Though the paseage ^^hlch included this sentence was about a fainily*s 

f riistratad &x^e for a clock| the ianteiTce stariddLng alome tou14 _ seOT 

require dlstraatot' C0§ colt^ for completion* 

Deljeted _TOrd ab o ve le^l ^fi jassaae^ Readablllcy lewis of passageB 
were based on Spache and Dale^ife'iJJ soorasj -f^lch^ motig other factors, 
dapand oil percentages of hard ^rdi^^ ITheie TOrds are detannlmd by word 
lists* "Xhe drawback of suoh wrd lists is that thay do not distinguish the 
degrees of difficulty of words* Thus, oocaslonally passages inay contain an 
eacCremely difficult word, tha presence of %diloh la obicured by the 
readability score* Given such a passage^ If the "very hard word is clozadp 
the resulting item tjlll Dftari| not surprisingly, be dlffioulte Whan 
da^rLant Itemi se^ed to be g^^pllcable in tems of the difficulty of the 
clozed word, such a resource as Harris and Jaoobson^s Basic ElCTg^tajg 
Readins 'Vocabulariap Cl97 2) vi^^ consulted. Thus, in the following example, 
"the cali is lamak and scrai&^iy » ^- the passaga in question had a third grade , 
readability score, but the closed word, according to Harris and JacobBoni 
was a core 6 (si^th grade) woiri. 

Idiom . Previously daflmd (Phase 1), idioms lAich seamed to contribute 
to lean deviaTicy had the curious pTOperty of contrllutlng to both hard and 
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easy lCms« Appaifentlyj on sonaa Itane the distractors were so mldly 
iaplaiisible as to be lai^galy uncmpetltive, while on otthar ItCTis the 
strangeness of the distractors inay have corfuged some students and led thm 
to guess more than vas axpBatmit Escaraplei "by a thousand ot ____ of 
• • #"| CaD tjfiangla, (b) vtBh^ Co) more, (d) liltj (a) sparTOW* 

Insuff tcleTit contartiLal cl\ies* Occasionally, Gonte^cts from tAich given 
wotds are deleted do not obviously or a^liGitly cue the deleted wrdst In 
such caies Itans va^ In dif fietilty depending on the admntitlous behavior 
of distractars. In other words, in some cases %^here there Is little or no 
eKpLlGlt coatextual euelng^ of an 1JCOT|, d 1st rac toes jniy be plauLslble^^^ 
a lamtmv sepnent of contesct, while In other similar cases it happens that 
distractors are not inordinately' con^etitive. Escmples of the first sort 
in'Volva deviant itonsi eKmples of the second sort do not* Thus, in the 
following eKOTplef though the larger context clearly ijnplles the correct 
answ^er, the absence of any espldclt clue makes dlstraatora (b) and (d) 
excesst-valy canipetltivae '^FiVeryoTia was bargaining back and forth'*| 

Ca) snugly, <b) merEily, (c) loudly 9 (d) honestly^ (a) painfully* 

Title eiies answer^ A discussion of titles which cue correct answers 
was also presented in Phase 1, Most cases of titles cueing correct answers 
Involve ea^ Itaas. EKmplai tltle~Shopping at the Pish Marketj Iton— 
'*We came upon the fish inarket* " HoTOveri there are e%ceptioni to this 
generally at ioTu Not every instance of a title containing a word which is 
subsequeatly deleted results in defiance* In other words^ given titles 
tdilch apparently cue Itons, such apparent cues do not tiecessarlty alert a 
suffflclent pareentage of respondent s to produce devlancei 

Prior knowledge required ^ Deviance sometimes occtirred wksn selection 
of the aoTrect answer seaaad related to some extent to the possQsslon of 
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prior factual knotrledgs* Thus$ in a passage d@$Gvibing eviii^i and 
cirsiinetanees se'varal y^ars anteoedeat to the atitba^iik of the RewlutloTiary 
War, one of the It^s tnwlved inff eramtlal knowledgi based Oft the date of 
the signing of tha DecLaration of Ind^pendende Cl#^«;| '^sIk motm ye age would 
pass"| unless students were faralliaip ^th historical datesi one of the 
distracto^s on this ItOT^ weak<indsj TOUild be plausible as an indicator 
of time)* 

Colloquial egic^gslon- There are no doubt many ©Kasples of colloquial 
a^rQBaions or usage Interspersed thMti^out the passagei on the cast Eonas. 
One of these e^ampleB seems clearly cointrlbutoi^ to item dei?iaiiG€ir The^teffl 
In question^ oCGUrrttig in a paisage of historieal timmtim^ creates what 
may be e^lledi for wsLTit of a simpler temii a bKamafe iti passage deeomro* 
That iSj the Item, *'the people of Boston had had iX '^Ith Britlrfi rule^'^ 
represents a suddBTi and inappropriat;e InCroductiou of a inodern colloquially 
into a passage ^tch does not justify mch uBmg,Bm 

TypQgraphioal errQr # Defined in Phase 1 under "*at:ror|" typographical 
erifors e^^lain seme difficult itenii* 

Diffloult aQntenca CQnstyu&tiom 'Mmny difficult ItOTis sean to occur 
whan words are deleted frcm complete oi tinut€ual ientenO€ const wotloni* 
Example; ''But being a cahn and quiet young lady ^ rtsa did not say ar^thlngf 
although the ^ola hl^ school buzzed with mmoTB^ ^^jSffgeS j report ad ly 
authentic announCCTenti on the part o£ scudents who had no eight to be 
making announcfflnents at all." The difficulty of th© l^st deletlonj gtj easeS s 
seems a result of the apposltlonal coiistjfuetlon in '#kleh it f unctions* 

Speaialiged wird usage. Most wo ids have mo^e thari one meaning or shade 
of meaning, and the variant meanings of eorae wprdta differ substantially. 
Further, certain words have technical meanings asiO'ttiated ^ith ipectftc 
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fields of endea\^r« In either kind of instaticei GorreGt Identification of 
such a meanlTig af a wovd depends to a certain extent on previously having 
njcperlsnced the word in Its specialised use* Even such a relative coninon- 
place as "the ragged ^enp" where ^® used iynon^ously for corral ^ 
resulced in a df.f£iciilt ItOTi, partially, perhapip because not everyone knows 
that a pen can. be a cojfralji 

IneKpltcabLe* Deviant itCTis for ^Ich tuo iatisfactory escplanation was 
discovered ware tlius categorized. ItOTS so identified represent labored but 
unavailing esctotnatlons of passages aid disCractor behavior. 

Di^g t rag^Q^^^^gogi at ed . wl^= aoiiteag g. . . 'Ehece^ was one inst anga . of a 
decant Item in ^jhlch selection of a certain dlstractor was e3^plalnad in 
tems of its association with some wrds in ccntesrt. In the sentence, -*Ihan 
averything in this world was daA and bit teg for the minstrel of the gods,*' 
the dlstractor heayeiily was frequently chosen* This was e:^lalned by the 
hypothesis that _ha_av&nly and gods have a typical association ^ichf givan 
gods In conteKt, would tnake heavaniy an attractive choice* 

Table 8»3 Piuinnarlses by tasting level the frequency of Intei^retationi 
of ItQii deviancy on the 12 test foOTS analysed In Phase 3* A narrative 
consideration cff Table 8#3 and Table 8*8 in Appendix B now follows* 

Of the 31 deviant Itans on the four test foms analyzed at Level Ij H 
Itans (36%) are easy and 20 (67%) are hart« Of the factors hypothesized as 
explaining or contributing to the deviancy of the easy Itemsj coOTnon associa- 
tion of words Tap resents 64% (7 itTOs)p titles cueing correct answers rep re-* 
sent 27% (3 lt«ai)j and idioms represent 9% (1 itsn)# There is some overlap 
In the Inte^qprctation of the facto M contributing to the deviance of the hard 
items at Level I* Of the factors hypothesised as e^^laining or contributing 
to the deviancy of the hatd itanS| sanaantically plausible dlstractors repre- 
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Table 8,3 



Frequency of InteiT^*etatiens of Item Dtviancy 
on Multiple-Choice Olo^e Extroisee 
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1 



sent 45% (9 iteins)p insufficient contextual clues represent 20% (4 itenisDi 
idioras represent 10% (2 items), and prior knowledge, deletad wrd above 
level of passagap and oolloqu4al es^resslon represent 5% (1 itCTi) each* 
The devlancy of five of the difficult Itans (25%) was Inexplicable* 
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Of the 52 deviant ifceift^ on the fouaf test foiras analysied at Level II, 
18 itans (35%) are aa^ and 34 it^s (65%) are hflrdp 0£ the factors 
hypotjiaslzed as e^qplainitlg or contributing to the doviamy of the ea^ items 
comon asMclation of wipdi represents 83% (IS itm^m)^ title, cueing correct 
answer, represents 11% (2 lt^i)j and idiom represents 6% (1 item)* At 
Level XI there is also wnae o^trlap in the Inte^retatlon of faetors 
contributing to the de^l^tic^ of hard Itons* Of thi factors hypothesised as 
eKplalning or contributlmg the devtancy of the hard ItOTii, semantically 
plausible distractora vepre^atit 32% (11 Itons), imwfflcient contextual 
aluea represent i4% (8 Itms^)^ difficult senteme eo^nstMOtl reprasent 
15% (5 itans), idioms wpreSent 9% (3 ICots), typographical errors and 
specialized word usage represent 6% (2 it^s) eaah| and prior knowledge 
represents 3% (1 item)# Thm deviancy of five of tha difficult itoas (15%) 
was Inexplicable. 

Of the 44 deviant Items on the four test fomi analyzed at Level IIIp 
14 items (32%) are easy and 30 Items (68%) are hard* At Level III there is 
some overlap in the Inteaqprmtatlon of factpra concrihutlng to the deviance 
of both easy and hard AtTOi« Of the factors hypqCheslssed as e^qplaining or 
contributing to the de'^laney of the easy It^ip COTtaon association of words 
represents 83% (12 ItOTs), 8:Tid title cueing cori^'ect answer and idiom repre* 
sent 7% Cl Item) each, l^rtherp grOTiiatically ta^Xnuslble dlstraotors also 
Sean to have a bearing on the deviance of 3 of the 12 itans involving conraon 
assoaiatlon of words* Of the factors hypothaslaed iii es^lalnlng or contrlbu* 
ting to the deviancy of the hard Items, difficult Bantence conscructlons 
represent 53% (16 itejn^)j. samantlcally plausifcle dlatractors represent 20% 
(6 items), insufficient cont^Ktual cluas represent 13% (4 itms)^ speciallgec 
word usage and deleted words above passage Ifefvel represent 10% (3 itms) 
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eaeh, md i4tmi represeiits 3% (1 ttm)m The devianGy of of the 
difficult ifcems (6%) was inei^llcabl'a* 

Oinesfailly ^aakingf by far the most frequeTit interpretation 
hypoths©i^e;d in an attanpt to es^lalm deviant itCTis which are easy Is coOTnon 
association of words. Such a f inditig trould sean intuitively reasonablej 
that t$i. siwoa there cL early are mawf cc«abinations of words which occur 
rapeatadly in written dlscourseg it not surprising to find that words in 
such oomblni^tlons produce easy It^^.^ 'S'mn they are closed* Indasd| It is 
desirable that a certain proportion of. such famlLiar word combinations be 
rep re $ anted In the close passagei^ lo to . LiBaLlt. or restrict ttie^- ......... 

frequency ot occurrence of suah wotd oombinatlons In the close passagss 
would li^ad to the production of highly biased and Inordinately difficult 
or unusual sting materials* 

Th^m factors seeni most frequencly to be associated with hard deviant 
ItOTSi s^j^ntlcally plauilble dlstmctors^ insufficient conteKtual cluesp 
and difflcuit sentence const ructions^ The Influence and effect of the first 
two factors ti to a large extant beiinig attendad to in the current review of 
the total carpus of close materials discussed in Phase 1* That ISj all 
close pa§sages and Itaas are In the p»cess of being correctedl iemantlcally 
plausible distractors are being replaced and passages ^ich contain itOTis 
lacking $u£;flclent conte^ctual clues are being ra-closed* 

That difficult or unusual sentence constructions are observed to have 
a recuWent relationship to difficult Items is also intuitively logical* 
Sentence aomplertlty does produce sentence difficulty* Again, since it Is 
both necasi'^ry and desirabLe that the alose materials be representative of 
the kind of wading matarials that ifcudents actually encounter^ it follows 
that a Certain proportion of difficulty cotnplex sentence constructions must 
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oecar in th© clo^^ tnatiiVlali# Thus, ati effort to avoid or edit out such 
sentance coTistWCtioias -rouli? be misguided and counter-productive. 

The analysis of it«$m deviance ^owi^ then^ that the proportion of 
deviant itmae Cappwjctaataly 20%) is not large. Current review of the 
oloze materials should 3eeduce the proportion of deviant ItCTttS both by 
oorreating and by acepLwing potantially deviant lt(ans« And sinse devianae 
related to eoiroon wort ^sioclatlons and difficult santenca construct ions 
seoma to atta$t to tdie relative freedom £wm bias in the procedures for 
selection and pCTduettom of cloze mataKials^ such deviance is desirable* 

. It should be riPftBd; further that certain aspects, of . the cloge matertali 
related to davianaa ( specif Icallyp coranon word aisoclations concerning ea^ 
itans and dlffie^ult or specialized words and difficult sentence constructions 
concaming hard itTOis) suggest more about the weaknesses Inherent in the 
Spache and Dal^'^Chail readability fomulas than they do about the ostensible 
weaicnesses of olo^e 3B#tferlals« These two readability fomulas^ and oerhaps 
especially th^ Dali«'Chail fomulap are particularly iniansltlva to the 
relative difficulty md easiness of words and to the relative difficulty 
of various klndi^ of sentence constructions* Because of the insensltlvity 
of these readability Comulas^ thenj passages within the same range of 
readability saorss tnay not perform stall arly because, da spite their 
readability ^aore^ji they are in fact relatively easier or more difficult. 
In other wordS-^ the readability scores on such passageii are vei^ mlslaadlng* 
This problemji however^, will be rectified with the taplonentatlon of Rasch 
passage callbrWlon pwaedures which will rank passages In the cloge corpus 
by parforniancia difflwlfcyt thus providing a finer, more sansltlve, and, 
hencep more praot:iaal and useful guide to the selection of cloze passages 
for test assembly pwi^oies* 
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DMg^ atlb Passages 

Qn the basis of inspection of the data (Table 8«7 in Appendix B) 
d^^aiA^sed in Phase 2 of the analyiis, 11 passages (of the 216 toUmt paisagei 
Qtk feh^ 36 cloze test foms) ware identified as deviant. Table 8»4 liits c. 
"^h^^m deviant pasiages by testing levelp test foBm manber, and TDN passage 
idaiiteifiicatiQn nmRber# Further^ the nature of the devlancy of eaoh 
p^mM$^ is also Indieated (i#e»p hard or ea^)* The following di^QUialon 
attiropts briefly to e^lain factora contributing to the deviasiay of these 
p^^sSagasi and ooncludei with several observations and recQnanendatiQns* 
■ ■ ■ . .^ ^.^^^^ Tablfr &.4 - -^ .-..-.......^...-........-....-^^.-..-.-....-..^^.-^-..^...^ ^ 



The Nature of Passage Devlancy on Multiple-Choice Gloze EKewlses 









Nature 








of 


Leyel 


Fom 


Pas saga 


Deviancy 


I 


3 


04-08-01-01-01-020 


Hard 


II 


15 


04-09-01-01-05-038 


Hard 




16 


06-13-01-01-02-029 


Easy 




t8 


04-07-01-01-03-013 


Hard 




13 


08-15-01-01-03-026 


Hard 


III 


25 


08-14-01-01-03-003 


Hard 




26 


07-13-01-01-01-010 


Easy 




26 


10-18-01-01-01-008 


Hard 




30 


08-16-Oi^.Ql -05-013 


Easy 




32 


07-13-01-01-01-007 


Hard 




33 


10-22-01-01-01-029 


Hard 



Bight of these deviant passages were hard^ and three were easy* 
Bitollw factors contributed to the difficulty of six of the ei^tt hard 
p^l5$a$ei* Thus, five of the six passages featured difficult senttnoe 
q^oyil^itewetionsj four manifested violations in deletion pattemsi three 



^C^ntatned flawed distraotori, and two were marred by typographical ercori* 
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Ifet only, thenp were iiniiar factors related to the di£Mi,wlty of theBe sdbc 
passages, but each of Che ste aontalnad one or more rwitdli^ble flaws* 
Thui, ravlaw should eliminate much of the devla^^^. dlffiOH^Ity o£ these 

The other tm dlffloult passages were IrrCTiediabla^ One featured 
obsolete or utiffflalllar v^ocabulary (e*g», irikwelli, and gc>a#t«qu±ll pens) 
and the other was incoherent. These two pasiages coulfl profitably 
elisiinated from the clo^e corpus# 

The three easy paisages all had an unGOnmon degree O^t t-edundaneys 
Suoh redundancy Is ^ uMetictable readablli^^ f omulas^^,. 8^ 
will TOO re precisely calibrate these passages according tiCh ptrformanoe 
difficulty* 

Eleven deviant passages from a total of 216 passag^;^ bai^eaks a high 
dagree of success in the development of praceduras for Oi^n^feruoting passages 
and teat foms which parfom consistently* Indeed^ of Rfr^ 11 deviant 
passages (three ea^ and eight hard)s only two hard pa^^^i^a, one clutterad 
with archaiSns and the other incoherent, sean to warranto ^Itmlnation from 
the close corpus* No reappllcation of close procedures i^wld <^r^der these 
two passages accaptable* But correction of dlstractor filwa aad adjuetaents 
o£ deletion patterns in the other six hard passages shottW Improve than to 
the point where they no longer function deviantly. And further, the three 
easy passages will be prealsely calibrated during the to^lwencation of the 
Rasch model. Thus, given the rules revisions and the ^wwnt review 
process discussed in Phase 1, cloge procedures seem to considerable 
promise In assuring objectively reproducible test mate^irti* 
ItCTi Deviance by Part o£ Speech 

Deviance was analysed In terms of the part of speeaB of items In order 



to dei:etr -rje v»h.::ther items involving givan parts of speech were charactatis- 
t:Ic.",^> ' vif4int* If such a pattern were discovered, it mi^t suggest that 
the pr-rt (or parts) of speech involved were not measuring the sane aspect 
of reading ability as the other parts of speech. Furthermore, such a pattern 
would clearly indicate ,the need for closer and more detailed study of the 
behavior of item? by part of speech^ 

Though the part of speech analysis has thus far been only a 
preliminary one, indi-..^ion3 are that there is iio discemible patterti of 
deviance associated with any of the parts of speech in question (±^©,5 nouns, 
verbSy adjectives, adverbs)* Part of speech data presented in Table 8*9 
■md 80 10 in Appendix B contain no evidence which would taiplicate any of 
these parts of speech as characteristically deviant. 

If these brief preliminary observations are confimed by further study 
and analysis, then it may be concluded that cloze itms involving nouns^ 
verbs, adjectives, and adverbs are all relativeiy equally good and useful 
measures of literal comprehensloru 
Fit Mean Squares and Deviance 

The Rasch measureiuont model is used to analyze test result data in 
order to determine whether the testing materials in quastlon (i.e., it^s 
on test forms) are consistently measuring the sane phenomenon or phenomena 
(in this instance, the reading-related ability called literal Qomprehension). 
Thus, the Rasch model assumes that all it©ns on a test form are measuring 
the same characteristic or ability. In order to test this hypothesis, the 
model calculates a fit mean square for each itm. Wien the fit mean square 
of an item exceeds a given point, the ItCT is said to misfit. This implies 
that what the item is measuring seems to be at variance with what the other 
items are measuring. Since the cloze test foms are intended as a 
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unidimensional measure of literal canprehension, it is assioned that fit 
mean squares on cloze itonns will help idantify and rectify prDblCTi items. 
The fit mean square analysis involved identifying BVexy ±tmn (on the 

sfflne 12 test foms analyiiad earlier in Phase 3) with % fit mean square of 

- ' 4 

3.70 or higher in order to detemine the frequency of Itaa misfit and to 
detejariine whether the misfit of items could be es^lained in ways similar to 
the hypotheses put forth (see Deviant Itan Analysis) to e^^lain itCTis 
identified as deviant on the basis of variant easiness scores* 

Results of the fit mean square analysis of 12 test foms are presented 
in Table 8.5 On the four test forms analyaed at Level Ij there are 15 
misfitting ItmaBs or 9% of the total items on the four foms. At Level II^ 
there are 29 misfitting Itmsp or 12% of the total Items* At Level 111 
there are 28 misfits, also 12% che total lt©ns« On the face of It, then, 
there seons to be only a low proportion of Itan misfits on the 12 test foms* 
This would seem to indicate that the clo^e testing materials conforin to a 
unidimensional measuranent model* If further^ more sophisticated 
analyses of misfitting tend to conf im these preliminary findings^ then 
tit. close testing materials will provide consistent and precise astlmates 
of literal comprehension* 

It should be noted that factors other thap lt©n quality can Influence 
the fit mean square of an item. To wit, students who perform unusually and 
the elment of chance can both contribute to high fit mean squares on given 
ltms# Since there are ways of accounting for the Influence of such factors 
on apparently misfitting itons, more detailed study and future e^erience 

^^Benjanln D* Wright and Ronald J. Mead, In personal consultation, 
recamaena ed that a fit mean square of 3*70 or 4,,00 he used as the detettninant 
of item misfit, 
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Table 8.5 



Analysis of Multiple- cholca Clo.a Items 
With High Fit Meati Squares 



Fom 




Passage 


It em 


easiness 


1 


3 


.75 




34 


.32 




33 


.32 




37 


.32 


3 


15 


.57 




4 ate 

1/ 


.57 




22 


.35 


8 


17 


.45 




35 


.41 


9 


15 


.56 




26 


.49 




38 


.40 




39 


.40 




40 


.40 




41 


.40 


14 


1 


,83 




19 


.72 




35 


.66 




44 


.53 




47 


.53 




51 


.46 



Item Fit mpan ^. 

r^c mean tem part of 



•15 15.79 
•37 7.39 



•^5 6.98 
•55 6.10 



13 11 



74 



276 

O 8-36 



easiness aniiaTja i j 

" ■ aquare clev ianc 

.70 8.72 
•22 6.71 
•37 4.61 
•35 4.60 

^38 5.77 

.47 5,79 

•35 12.88 

•34 3.95 

•30 5.30 



speech 

2 
2 



3 
J 
1 

2 
4 



.« 8.44 X 1 

•^7 3.75 : 

•56 7.94 J. * 

•29 5.90 J 

X 3 

1 



8.98 „ 
•50 5.23 « 2 



X 4 
X 4 



J ^-'^ * 4 

•-^9 9.07 J 



18 .74 =Q ^^'^^ 3 

20 7/ '^^ 5.93 V , 

^74 .39 = „ A 1 

fn 4*59 J 1 

^° .72 4'27 1 

^ -"^^ •39 4.04 2 

f -97 3.87 

* '84 .55 ^-^l 2 

' •88 4*56 ^ 1 

-68 37 X 2 

36' if ^.73 J J 

^® -SS 8.81 X f 
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Table 8.5 (Continued) 





Passage 


It- am 


Fit mean 


Item 


Part of 


Item 


easiness 


easiness 


square 


deviancY 


speech 


8 


.80 


.36 


5.09 


X 


1 


16 


.79 


*85 


5.34 




3 


29 


74 


.81 


3«70 




't 


37 


.59 


. 14 


5.34 


X 


3 


44 


, 50 


,20 


10.49 


X 


3 


48 


50 


.45 


6,04 




2 


Si 


.47 


.20 


12.47 


X 


3 


56 


*47 


.29 


4.69 _ 


X 


3 


57 


*47 


,32 


3.86 


X 


1 


1 


.74 


.90 


7*62 




1 


2 


.74 


.95 


3.70 


X 


2 


6 


.74 


.37 


6.13 


X 


3 


9 


.74 


.45 


5,00 


X 


1 


12 


.61 


.37 


5.64 


X 


1 


14 


.61 


.45 


5,14 




1 


19 


.61 


.15 


22.15 


X 


1 


31 


.69 


.25 


7.79 


X 


1 


50 


.48 


.53 


3,71 




2 


9 


.74 


.82 


3.95 




1 


10 


.74 


.44 


8.22 


X 


3 


32 


.41 


• 45 


3,88 




2 


35 


.41 


.23 


3.77 


X 


1 


38 


*4l 


.21 


8,03 


X 


1 


47 


• 51 


.64 


4.43 




1 


10 


.87 


.92 


15.37 




1 


12 


#77 


,79 


5.87 




1 


13 


.77 


.60 


6,82 




2 


16 


.77' 


.68 


7*25 




3 


17 


.77 


.67 


14,64 




2 


33 


.71 


*52 


4.94 


X 


3 


52 


.48 ^ 


.56 


4.03 




' 4 


11 


.84 


.58 


4.61 


X 


4 


20 


.84 


.79 


5.46 




1 


22 


.61 


.41 


8.86 


X 


1 


23 • 


.61 


.25 


6.00 


X 


1 


43 


.64 


.47 


4,08 




4 


48 


.64 


,75 


3.85 
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'^rith the Rasch model tray lower eve further the percantage of misfitting 
it^s. 

It is curiouSj at first glancej that ther© is so little overlap betwean 
misfitting iti atid easiness-devianf; itms. That iSj many high fit mean 
square items do not appear deviant in terns of easiness variancej and many 
easiness^deviant itans are not misfitse ThuSp there is no necessary 
connection betwaen misfitting items and itras devruait in rather traditio .al 
tems* The esqplanation of this ostensibly strange lack of connection = ies 
in the fact that the fit meari square is a function of the Interactiini ol. 
student ability and itm difficulty, Thusj even given a flawed itei , 
^ich is very hardj students in given ability groups may perfom accordii^ 
to statistical prediction* In such a case^ the flawed itOT will not misfit. 
Stoiilarly, on an apparently unflawed item students in given ability groups 
may not perfom according to prediction* Such an iten, therefore, would 
misfit* One conclusion which sesns to follow from these observations is 
that since the fit mean square does not consistently identify items flawed 
in traditional tems, it should not be used for such identification pu^oses. 

There is no necessa^ conflict here, however. Both the fit mean square 
analysis and the more traditional itoii deviancy analysis are essential, to 
the refinanent and calibration of the close material s. Flawed Items must be 
rectified to the greatest extent possiblep and misfitting items must be 
studied carefully in order that the measur©nent properties of the close 
be determined and consistently established. 

At this preiiinina^ stags of analysis, inspection of misfitting items 

seons to indicate a high degree of item consistency, or unidimensionality* 
But a further level of consistency, passage consistency, is desired for 
the cloze material s« A method of evaluating the extent of passage 
consistency (or unidlmensionaLity) , though developed^ has yet to 
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be linplCTiented, and this method will rely heavily on the use of the fit mean 
square* Hence the importance of ongoii^ study into its uses# 

GonclusiQns 

The results of the three phases of the analy s of the 3b cloze tesc 
forms are generally very positive* Critical exanination of the test foms 
independent of test result data has led to the revision of the cloze rules 
and to the initiation of a thorough review of all clo^e material s# The 
rules revision is an essential step toward the eventual development of an 
algorithm which will permic maxtaum computerisation and mechanization of 
cloze procedures* The revised rules also now make it possible for others 
to produce clo^e passages and items with high consistency and <^ality« 
Further, the revised rules are a practical guide for the current review of 
cloze material Sp a review which gives pTOmise of assuring a final product 
which is free of flawed passages and itOTS* 

The analysis of the consistency of the passage graduation pattern per 
test form and of the difficulty (i«e, , statistical easiness) of passages by 
readability level revealed a generally high degree of ;y\ujc«i^=s.. Variations 
in expected and desired patterns are largely attributable to the insensitlvity 
of the readability formulas used in the initial scaling of passages* The 
findings of this phase of the analysis^ then, suggest thtAt the cloze test 
foms are remarkably consistent measures of literal comprehension. And the 
finer calibration of passages anticipated from the application of Rasch 
model procedures will eliminate the almost inevitable inconsistencies 
in passage scaling resulting from the use of readability fomulas« 

The analysis of deviant itms is positive in several respects* Flrstj 
the percentage of deviant items was low* Second, a large proportion of the 
flaws associated with deviant itanS will be corrected duflng the cloze : 

279 

8-39 



review process^ Plnallyi much of the item dftviancy testifies to the 
relative absence of bias In the passage- reliction and item-construction 
proceduraSe That is, the clo^e materials sesn an accurate sfflnpling of 
reading materials actually encountared by students. 

Preliminary analyses of passage devlancyj part of speech and deviancy, 
and fit mean squares are all encouraging* There were very few deviant 
passages^ and most of those diould be revised or adjusted by the review 
prtDcess and by the Rasch caXibratlon of passageSp That there was no 
discernible deviance pattern among any of the four parts of speech upon 
which cloae It^s were based suggests that these four parts of speech are 
nearly equally useful as measures of literal cOTiprehenslon* Fit mean square 
analysis re^maled a low proportion of misfitting ItOTS, thus implying a 
high degree of unidmensionality in the clo^e test foms* 

For an expertnental effort^ the administration of the 36 elo^e test 
forms appears to have been highly successful in approaching the development 
of a consistent set of testing materials for llterai COTnprehensiori# And the 
knowledge and experience gained through this experimental process promise 
greater success In futuw >rts* 

Perhaps the most important foci for the further Improvanent of cloze 
procedures and materials are the effects of titles on cloze passages and 
more fleKible passage fomats Cl#e#p with greater length and more adequate 
context)* On the other hand, further study of the function and utility of 
fit mean square analysis and top 1 mentation of the teciinlque for Identifying 
passage misfit are essential to the achlevOTent of consistent unidtoension-< 
ality among cloze testing materials and to the calibratloti of the cloze 
passages, 
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GHAFrER IX 

RELIMILITY AND VALIDLY OF THE 
MULTDLE-CHOICE CLOZE AND WH--ITm TESTS 

This chapter is concerned with what can be said to date about the 
reliability and validity of the literal comprehansion tneasuress Presented 
here are the results of an Initial e^loratiau of the data from the May-June 
1975 test administration first outlined in Chapters V and VI. These 
rec^ilts represent the first stage of the analyses projected for the Multiple^ 
Choice Cloze Exercises and other testing materials in the prelimina^ 
validation phase of the research (see Figure 6#1 in Chapter Vl)s This 
presentation is intended only as an early indication of the confidence that 
might be placed in the testing materials under dovelopment here* More 
detailed and definitive reports will be available J at 

The discussion is organized as a research report - . can ho larp^ely 
read and understood without detailed reference to the remainder of the text* 
Accordingly, presentad first is a general overview of the study design, 
descriptive statistics for the major variables in the studyj, and a briaf 
simmary of the data analysis procedures. 

The presentation of the results that follows begins with an exTOina- 
tion of correlational data that reflect on the comparability of the clo^e 
and wh-item test fowiis from t^ perspectivesi (a) as ineasures of the sane 
construct of literal comprehension and (b) as parallel test fom&« This 
section of the results further analyzes the reliability and hGmogeneity 
of all test foms constnicted for the research* 
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Next, the discussJoTi is concerned with sinnmarizing the results of the 
preliminary applications o£ tUo B,d^ah an,nlvsAs progr™^ provided by Wright 
and Mead (1975), to each of "htt ^^eiai: /r^*rm> ,iHAOTbled in the clo^e and 
';4i-itaTi fomatSe This is Vury lc?,rge set ^iC and a lengthy pr-^* 

sentat^on is not atten^ted here. Inrste^^u^ an attempt is madti iiiimnarlae 
how well, the total data set available on both the cloze and #i-item test 
fomats fits the Rasch measuretnent inodGl. Both the Rasch analyses and 
the more traditional analyses of inte.^u.al consistency referred to in the 
previous paragraph reflect on the unidtoionslonality of the trait measured 
by these tests* In addition^ these analyses provide a broad basis for 
evaluating the extent to which the multiple ^choice cloze item form can be 
applied to sffliiples of written discourse without seriously biasing the 
content of the testing material s» 

The final section of the report presents the intercorrelations of 
scores on the litetj/ comprehension measures and scores from the reading 
and language sections o£ the achievement test used in the study sanple* 
Also Included In the Intercorrelatlon matrlK are measures of verbal and 
non-verbal IQ and a measure of each student test^wlseness. Together, 
these Intercorrelations provide a rich context for es^loring the construct 
validity of the cloze and wh-^item tests^ an exploration which begins only 
in brief with this reporta^ 

Organisation of the Study 

The procedures for assembling the literal comprehension measures for 
this research were outlined In Chapter V as part of the presentation of 

Slore detailed analyses^ to be conducted rfiortlyp will exanlne the 
correlations of individual itOTis on the California Achievement Test with 
multiple-choice cloze and wh-ltOTi scores. 
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potential applications of the cloze exercises. As reported in this discussion, 
a battery of tnultipla-cholca cloze and tdi-^it^ test foms was aisembled 
and administered to npproKimately 5^000 students distributed more or less 
evenly over grad-ts l-9« Theas tests were atalnistered in May-June of 1975,* 
in conjunction with the school district standardized testing progran, thus 
m^lng available additional test scores that could be used to e^lora the 
validity of the new tests of literal comprehension* The characteristics 
of the various tests that were used in the analyses are simnnarized below. 
Multlple^Cholce Cloze Test 

The multiple-choice close tests were asSCTbled by systematically draw^ 
ing passages from the Test Development Notebook (TDN) in the sampling 
design previously illustrated in Figure 5*5 In Chapter Vm This design 
produced 3 sets of parallel test formss with 12 EoCTifs in each of 3 
successive test levels. A test fom in Level I contained 6 passages, 
ranked by readability level, and 39 or 41 itans# A test fom in Levels II 
and III contained 6 passages, ranked by readability lewlg and 60 items. 
Test foms were randCTiy and evenly distributed across the student papulations 
as followsf Level I-«grades 1*.?^ Level Il-^grades 4-6, and Level III--grades 
7-9* 

ffli^Itm Test 

The wh-item tests were developed as an altv^rnative measure of literal 
comprehension and assembled following a design that was virtually identical 
to that used in assMbling the Tnultlple-chplce clo^a tests. Like the cloze 
testSf each wh-test foim contained 6 passages ordered In a fixed range of 
readability for each test level. However, the wh-.,tOT tests differed from 
the multiple-choice cloze tests in nimiber of items per test fom am" in the 
item fomat*, The vAi-item testa each contained 30 mttltiple«choice t^st it^ms, 
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5 for each passage^ Each question was. a verbatim trans foraiation of a 
statement In tha associated passagej based on one of the eight ^-item types 
Cl«e«9 howy what [noun]^ what [verbjp t^en^ where, ^ichp x^to^ t^y)* 
Tgstp^Wlsanass Teat 

Because mult Ipla-cho ice tests of reading CQmprehension are vulnerable 
to a form of tast-^wlseness referred to as "passage Independenc : , ' ' a special 
test was cDnstructed to measure some aspects of this chara nj^-/;-: 
Referred to as the test^wlseness tsst, the design of these tests paralleled 
the cloga and wh-*itan test form d :;-:l.rus» The ^-items In each of the three 
test levels were pooled and sysK . ■ ;k^= lly assigned in units of 12 to each 
test form, such that no test iter* . ^ chis part of a given test ware 
referenced to the passages on that test. Care was also taken to represent 
the passage dijEf Xculties and types of wh-ltems in a test level in an attempt 
to create parallel test forms* The relationship between scores on this 
test--wiseness measure and scores on the trti-ltem test provides some indication 
of the extant to t^lch students* responses on the latter test are dependent 
on reading the associated test passage s* This test also provides some indica« 
tion of the extent to which this fom of test-wlseness r^ffects responses 
on the raulciple«cholce cloze test* However^ it is probable that the cloze 
test Is vulnerable to other forms of test-wiseness, a possibility that 
should be invescigated directly* 
Sho^rt Fom Jest of Agadgnic Aptitude 

The Short Pom Test of Academic Aptitude (SETAA) is a group-*a4ninla- 
tered Intel tlgence tear that yields language and, nomlanguage IQ's* This 
test, adminlLuared by the school district along with the California Achieve- 
ment Test, permitted study of the relationship between IQ and the literal 
comprehension tests across the study subsample. 
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Cal t Eornla Achievement Teat 



The California Achievement Test (GAT),, ' ' V. was administered to 
students In grades 1-8. The CAT has four Lbvc=.l-, is follows: 



1, Reading Vocabulary, 

2, Reading Comprehension, 

3, Language Mechanics, and 

4, Language Usage. 

In addition, the CAT provides reading and language subtest scores by 
test level as shown in Table 9.1. 

I'able 9,1 
CAT fiubtests by Test Level 



CAT Level 



GradeCs) 




Each CAT Level provides scores on the following major sklllsi 



CAT Level 



Sub teat 



I 



II III 



Sentence Picture Association 
Beginning Sounds 
Ending Sounds 
Letter Recognition 
Wo rd Fo rm 

Pictura-Woi'd Association 
Word Recognition 
Words in Contuxt 
Facts 

Interpretation 
Relationships 
General la at ions 
Inferences 
Reading- General 
Readlng-Soc» Studies 
Reading Science 
Reading -Mathcmat ic s 
Standard English 
Sentence Structure 
Sentence Parts St Functions 
Transformations 



X 
X 
X 
X 
X 
X 
X 
X 
X 
X 



X 
X 
X 
X 



X X 

X X 
X J X 

X X 

X X 

X X 

X X 

X X 

X X 

X X 

X X 

X X 

X X 

X X 



X 



X 
X 



X 
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All of these skill and subtest scores for the GAT were used to e^lore 
the raeanliig of the tests of literal comprehanslon developed for this study* 



The data set available on the foregoing test scores was organized for 

analysis by CAT level and by wh-iteiE and multlple-choiGe closa test level* 

There were four iuch subiroupsi (a) students in grade 1 ^o took Level X 

on the GAT and Level I on the ^.-*itCT and multiple* choice cloze tests 

(N — 456) I (b) students in grades 2 and 3 vAio took Level II on the GAT and 

Level I on the wh^itOT and multlpla^choice clo^e tests (N — 972)3 C^) students 

in grades 4^ 5^ and 6 who took Level III on the CAT and Level 11 on the 

%4i^-±t&n and TTiultiple-^choice clo^e tests (N - 1, ;99).| and (d) 'students In 

grades 7 and 8 who took Level IV on the CAT and Level III on the vh^item 

2 

and Tnul tip la -choice clo^e tests CN ^ 594)* 

To make this analysis possibleji the ra^^.' r-^^ores for the -i^-item and 
multiple-* choice close test fomis were conver-.jd to m scores based on the 

pc'ore distribution for each test in a test ovel* Subsequently, negative 
values were eliminated by applying a linear transf omat ion to each set of 
s'li-if.ned z scuras* The resultant scores frOTi any of the 12 wh-sitein and 

: i! :lpJ.e*^wii : "^>.:e cloze tests in a test level were Che after treated as 
having come from equivalent test fornis and were combined as required for the 
analyses by CAT level. This approach to test equatlngi though somewhat 
unorir^'QcioK^ Is defensible on several grounds* The general shape nieans^ 
and the standard deviations of the distributions of wh-itOT and multiple^ 
choice cloae test scores were very similar from fomi to fom in a test level 
(usually the average raw score difference from fom to form was less than 

^The ninth grade is not included in the main analysis becausG the CAT 
was not given at this level* 
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cnft-*fcurth of a standard deviation) ^ the internal reliabilities of each 

fom mTB cotislstently high, and the tests had been systematically 

3 

ess^hled to be par "^^l in order and range of raadabiMity level. 

The comparability of the cloae and wh«it^ tests as measures of a 
conmon construct of literal con^rehension was exanined by Intercorralatlng 
the various sub scores and total scores on these tests by test level. The 
test-wiseness measure was included in these analyses* Thr reliability of 
all three types of tests was estajnated by applying the KUder-iRichardson 
Fomula 20 to the 108 available test fonns* 

The findings from the Rasch analy* j1 thu ^-^item and inul t ip I e -choice 

cloze tests are constrained tc . ,7im^'i'f uf tbe fit of the available data to 

the Rasch measurement irodal* (The complete Rasch analysis on the 72 

4 

wh--lt€3Ti and cloze test forms Is voliMlnouss These analyses produced detailed 
item statist ICS J estimates of Rasch difficulty values for each itemj 
ability esttaates as.wclated ^ith each test score^ fit raean squares for 
each item within specified ability groups^ and blvariate pic- s of major 
Item statistics*) The fit of the data to the model is determined by the 
inean and standard deviation of the fit mean square values within and across 
student ability groups^ If the data fit the Rasch models it can be concluded 
that the variable being measured is tinldmensional* Of particular interest 
was the consistency of fit across forms and grade levels^ which would reflect 
the stability of the trait when measured by virtually any systeniatically 
ordered test foim that might be aasanibled from the ntultit? i e-^cholce cloze and 

This smm proc^^ULe was followe^^. for the test- , . ,est formSj 

but here the justification for equivai t'.nce of test foOTS is ic^s adequate, » 

description of a complete Rasch analysis is pTOvided in Ghapter VIX 
of this proposal* 
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wh-ltem passage and item pools* 

The construct valldlUy of the multiple-choice cloze test la studied 
in the final analytical section of this chapter. The analysis is logical 
and correlational, Zero-drdatf, Peaj-son product-moment correlations are 
used throughout this consfcruct validation section. A priori apeclf Icatdons 
are made about the ejected correlations bet-waen the multlple-cholee cloBe 
test and the wh-item test with all of the skill and sub scores deiived fiom 
the C^. When ejqjected correlations are not obtalned» a datailad study Is 
made of the posBibie causes for the unejipected results. These detailed 
studies focus directly on analyzing the tt&a. content of tha CAT subtest In 
question in relation to the dascrlptlon of the construct oE literal compre- 
hension provided here* 

To provide a basis for interpreting the results that follow, Table 9,2 
displays the means and standard deviations of scores on the major variables 
under consideration. It will be observed that the multdpU-chotce cloze, 
^-item, and Cest-wiseriass tests were standardized (not noBnallzed) within 
level. The major subtests on the CAT are norniallaed. The derived subscores 
on the CM are raw scores. 

There are several characteristics of this data set that must be taken 

Into consideration ^en Interpretations are made of the aero-order, Pearson 
piroduct-moment correlations between variables. The consttfuct validity 

sections of this chapter are bftsed entirely on these correlations. 

It is well-known that the reliability of a test as wall as the shape 

of its score distribution affects its correlation with another test or 
measure. Reliability, as shown in the nest section, does not contrlbytp • 

greatly to anblgulty In intecpretlng the results reported here. The wh-ltem 

and cloae tests meet the most conservative standards of wllabiltty, 
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Tabls 9.2 



Mians and StandaEd Dsvlations of kom on the Nultlple-Ohoics Gloli, ffi-Itein, 
fest.Esatiess, California Miev^att, and Short Pom Test of kademlc Aptltudi Tests 



Test 



Multiple-Clioice Cloze Jest 
trti-Item Test 
lest-Wiiefleaa Test 



GiT level 



II 



III 



IV 







m 


X 


SD 


X SB 


1 SD 






55.39 


9.0? 


k%n 9.83 


50.53 9.2a 




8.12 


55.87 


?.S3 


k%s^ 9.92 


50.3? 9.28 




10,31 


51.10 


9.32 


kiM 9»90 





California Achievinisnt Test 

Reading i/ocabular^-ADSB^ 
Reading Comprehsnsion-ilDSS 
Lanpft|s Mechariics-^SS 
Lanpage lJsage-ADS| 
Lanpap IQ^-SFTM 
Nen-kiipagg IQ.-BEm 

g 

VocaWary MsQOres 

Senteace-Picture Association 
Bijinaing BoundB 
Ming Sounds 
Letter Recopltion 
Mi fm 

Pictiire-Word Association 
Mi Recopition 
t/ords in Oontejct 



516.11 3kS 


370,1^ 




it30.8? 




298139 ^8.29 


389.00 


65.05 


461,01 


m 


305.51 53.11 


386,65 


sm 


kSL% 


B?,53 


527.08 57.^8 

102.09 13.07 


355.22 
100, 5^^ 


1^,61 


^+59. 25 
9§.71 


?5.oi 

1^.21 


1Qit.31 15.66 


ioit,i5 









9.8? 
8,02 
B.9? 

m 
7,19 

lO.Oit 

3M 



i.ea 

1. ^3 
1.57 
2.09 

2. ^2 

3.90 



533.31 §2.75 
51^5.50 El.'fl 

5^,79 ?7.83 



18,50 2.^9 



fable 9.2 (CQntiped) 



liYsl 



lest 



Ogmpreliension Subscoris! 
Facts 

Intsi^ritatloii 

Rilatioii^ip3 

Seneralizatiofts 

InferenefiS 

Reiding-Ganeral 

-SgeiaL Studiis 

-Bciflnce 

-HatliiMtica 



Standard Mgllih 
Sentince Strueture 
Sentenee Farts and FunctioB 

IS 



5.29 
2. IE 



a. 06 
1.5^ 



II 



8.61 
5.69 

Ul 
5.19 



m 



n 

} - if 


r 


IV a 


SD 


I m 


8,52 


L31 


3,63 1-1? 




}M 


6.64 2.as 




1 i 10 


% W( 1.66 




1,9^ 


it.?o 2.30 




1,13 




Uh 










5,06 


2,32 

a.1? 


3*95 2.31 




5.5a 


15 i? 3.?? 




1.22 


2.68 1.59 


1,65 


it .50 2.93 


?J1 


1.29 


k.ok 1,30 



\jad« 8 atudents onlyi 

MM IV pl«s 8»de 9) to k« a «an .£ 5. atf standard deviation .£ 10. 

teeml store scale a=mss all grades for use ritk all levels and Eras o£ the CiT. 
scores obulmd £r« Short Fom Test of taadeaic Aptitude given rith OH. 



\ll CalifotnJe Mlevmnt test Sutscotes ace ta« scotei. 



The teat-wisenass test le sufficiently reliable for the pui^ose hare. 
A few of the G^T subceics should be suspect because of lengthj they are ^ at 
Level 111, Relatlon^lpi (4 itOTs), Inferences (4 Itrais), Sentence 
Structure (5 it^s) and tranifiomatlons (5 Items) ^ and^ at Level IV, Facts 
(5 Itans). The potentlaL low reliability of these subtests Is noted In 
appropriate tables. The subscores for the ^^-lt©ii testSj which are based 
on small ntcnbars of each of the wh-itOT types in the test^ and the deletion 
type subscores for adjectivas and adverbs on the clo^e test, \*lGh mm 
typically based on vary fe^ ItemSp undoubtedly contribute to reduced 
correlations anong the iubscores and total scores for these tests* However, 
these relationrfilps were as ejected, and they do not. se^ to have coxitributed 
to Interpretive problens, particularly when the high degree of internal 
consistency of the wh-lteni and cloze tests is Aown. later with other xnathods. 

k minor problm for tntetpretatlon n"" the correlational results derivei 
from the test score distributions of the clo^e and wh^-ltem tests ^Ich are 
generally skewed In the negative direction* This skewness is the result 
of generally easy tests %*ilch Interact with grade level in this case to 
produce more rnKt^mm negative skewness at the upper grades of a test level 
and also across the grades of the total study sanple. The general result 
of this skewnesa is a fraqtient lack of homoscedastlclty at the upper score 
levels in the blvarlate plots of the test score distributions in the present 
study# This effect appears to have produced attenuation In certain of the 
correlations at certain levels of the study sanple which is predictable as 
follows. 

In grades 1, 2, and 3, .aad pCTticula^ in- grad^ 3:*^ ^the w^- item and 
clo^e testa were considerably more difficult for students than in other 
grades. As a result there was no artificial ceiling on the tests, and 
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students* scores were spread m.ore evenly over the score ranges for the two 
tests. Given the high reltablllty of the Level I wh-ltm and cloze tests 
for grades 1, 2^ and 3, the corrslatlons q£ thm.tmveV Inclose and wh-item 
tests with each other and mth the GAT Level 1 and II skill and subtest 
scores should be close to the "true" correlatlotis anong these variables. 

In CAT Levels 111 and IV of the study sample, the de^ee of . 
negative skewness on the ^h-itan and clo^e tests Is more mailed, with the 
Level XV data being the most affected. At ^theee test levels^ the attenuating 
effects of skemiess on the correlations involvliig the wh-ltem and cloae 
tests should be noticeable* particularly at Level IV* The overall effect 
of the skewed dlstrlbutioTis of the scores on the tw measures of literal 
comprehension constmicted tot the study is to reduce the validity 
coaffleients between the clo^e and ^-itan tests atid between these tests 
and the various CAT score i In the upper levels of the study sanple. 

At Level XV, the validity coefficients are further reduced because; of 
an undetected error In the construetlon of the original QM data tape,^ 
Grade 7 data were ml^laced, so that the Level IV correlational data are 
based solely on grade 8 students* This results In a severe restriction 
in the range of student ability at Level IV with an associated reduction 
in variability. This reduction in variability is the major cause for the 
lowp aero^order correlations at QM Level IV. 

Test Gomparabillty and Reliability 

The correlations among subicorei and total scores within and between 
the three types of tests specially constructed for this research are 
presented in Tables 9.3^ .9^4, and 9.5. The three total test scores are 
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Table 9.3 



,,,,„,lation. of MuUlph-Choiai Sb..,Mti., f ^^f^^^ '^'^ and mmm 

Levsl If Gsadis1|:^Ir an3'3 (N = IjM^) 





mm 

I Verb 




3 Adjective 


i,?0 


k Advefb 


1.49 


5 Tetal 




^l,fim Teit 




5' How ' 


',63 


1 to(M) 


*64 


8 tat (V) 


1.64 


9 iisn 


1,56 


10 iere 


1.59 


11 ^ihich 


1*65 


12 io 


i.6e 


13 ty 


1.59 


14 total 


!,?9__ 


TistfiMtttiifest 


15 Total 


.32 


Mean 


49,7 


SD, 


9.9 




.69 



.^3 

j!L 
































.64' 










,62 


i46 


.32 *65| 1,58 














.59 


i. ^ ^ 






,61 

.35 


,47 
.41 


,32 .651 |.60 
.28 ,511 1.50 


,46 


, 49 " 




St 


.38 


,43 


.23 *6ll 1.54 


,56 


,55 


,50 




.61 


«47 


,33 .6^ \S 


,60 


,61 


,51 


,57 ' 


.46 


,30 .61 1 |i63 


,64 


,62 


.32 


,59 ,65 


,64 


,44 


,30 ,61 1 1.32 


.55 


.34 


.47 


,50 .55 




JL 


,39 ^^.aij lJ?. 


,80_ 


,80 


,69 


.77 M 


.30 


.14 


.19 .32 .23 . 




.24 


.21 


.25 .28. 



,28 ,23 .31 



t5., 49.7 49.7 49.7 M.6 4W 49.5 W.« W.6 49.5 M.5 49.6 
M M 10.0 10.0 9.9 9.9 9.9 9.9 I0.« 9.9 10.0 9.9 9,9 



Nots. Scores mm stai 



ndardiied ictoss gradii 1, 2, and 3 to have I Man of 50 and a.,D. of 10. 



labU 5,4 



1 



r— 1 T if) 11 12 13 14 .IS 



MCG Test 

1 Noun 1 ' " ^ ^ 

2 Verb \S1 '"-"-^^ 

3 Mjietlve i,7§ 

4 Ateb '.65 .64 i58 -"^^^^ 

6 to,- i.51 " .52 •« 'f '1 ' 

J Jha(V) ^ .48 .49 .44 . . ' • 

, to 1.4S .47 . 2 . ■ • 

11 «hi.h !.52 .52 .46 . . , ,. 

" \i i 15 'I : : :: :45 . 44 .« .48 .47 s -.^ 

UilQtsl' . 1.70 .70 .62 .J^^ 

15 m .28 _.28 .27 .22 iM_jM, 



.44"' 






.41 


.43 




,42 


,41 




.4? 




»W i48 


,43 


.41 


,13 .44 .30 ^-^'v 


.45 




43 * 48 . 47 . 48 


.10 


,70 


,72 ,?5 .71 


« ^ * ^ ■ 


^ ~ A « ^- 




.23 


.22 


J3 ,22 .24 , .25 



Mean 
SD 



Y, 'S ¥ 'K 1:! S S ¥ S lii ti 2 S S » 



Note* Beofel were etai 



ndardiMd across |rades 4, 5, and 6 to have tmm of 50 and S:D of 10, 
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table 9,5 



IntiEeorralitloni of Hu: 



s-Cholee.Glozai WM, ind tisB-Hiieniii Tiitsr 



letat kom and Subseorfis 




14" 13 : 



MOG TiSt 

1 Noun 

2 Vsrb 

3 Adjictivs 

4 Advetb 

5 Total 

7 Jfliat (N-P) 

■8 m (V) 

9 iiri 

10 lere 

11 tleh 

12 to 

13 iy 

14 Total 
Tesf Wlinesi Teit 

15 Total •il 



|.82 
1.78 

I'll 

^35' 
1.39 
j.34 
1,38 

|.39 
1.40 
1.45 



^33 



.91 
















.38 


,35 


.25 


,41 1 


,36 


,35 


.21 


.331 


,38 


.36 


.29 


.41' 


.3? 


,34 




,41 1 




i34 


,26 


.4Q| 
,42i 


.42 
.39 


,34 
.39 


.21 

M 


.45' 




,48 


,31 ) 




.19 


i22:- 


.12 


.23 



'.49 
[.44 
43 
,.44 
1.41 
1.45 
Ul 



.n 



.42 
.43 
.46 
.47 
.44 
.50 
.13 



.43 
.43 
,44 
.44 
,45 
.69 



,41 
,47 
.47 
,45 
,70 



M 
.51 
.49 
,73 



.51 >^ 
.46 .46 
.13 .13 



.10 .16 ,09 .15 .11 ,09 .11 



.73 
,11 



.17 



Mean 
SB 



50,48 '50,3r 50;63 50.54 49.93 50,25 50.03 50.11 50,06 50.61 49.93 50.37 49,73 

...9,30 9,46 9,58 9.81 9.28 9.36 9.19 9,42 9,57 9.65 9,32 8,81 9.52 9.28 Mi 



to mmm etaniaEdW acwis gfidas 1, 8, and 9 to hivi i nian of 50 and S.D, of 10. 
^Grades 7 and 9 ars not imluded, 
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based on the wh-ltan test, the multiple -oho ice cloze teac, and Che test- 
wlseTiess test. The subscores £ol mult IpU-cho ice cloze test are each 
based on one of the tout grannatical parts of speech deleted In a passage. 
For SKmple, each test Conn has a subscose for the sum ofi all coMect mun 
responies. The subscoMs for the wh-test are based on the various types 
of tdi-ltans ineluded in the teat. The ttest-wlseness teit has only a total 



scoceB 



Tables 9,6 and 9,7 present, respectlvfily, Che means and standard 
devlattons and the Kuder-Rlchaidson lomula 20 reliability coefficients 
for each of the 36 forms of the three types of tests, kU Tables <9.3- 
9,7) are organised by test level on the •sdi-itM and cloze tests (Level I 
for grades l-3i Level 13 foi grades 4-6| and Level III for grades 7-9).^ 

Test Comparability 

The Issue of test compaEabillty has two taportant facets of concern 
here. The first is the purely technical one of deteHtilnlng whether the 
varlotis test forms conitructed for the research were reasonahly parallel 
and thus could be canblned in the analyses by sub sanple level. 

The other facet o£ test comparability has to do wltli deteminlng the 
extent to which the ^-Itan and mulelple-choice cloze tests measure the 
saae thing. These tests were originally constructed ai alternate methods 
of measuring the same construct—literal comprehension. Tables 9,3-9,5 are 
relevant to the latter concern, though these data are mot a basis for strong 
Inferences about the parallellsTi of the constructs raeasared by the wh-lteisi 
and multiple-choice oloae tests. The Intercorrelationi in Tables 9,3-9-5 



^Table 9.5 is for test level 111 but the data are for grade 8 only, 
not all of the correlational analyses were completed In the 7-9 sampla at 
the tisne of this writing. 

^01 



are pTesented in three sections* For eKanple, Table 9.3 presents ttia 

inteircDrrelations of the deUtion type subscores for the ltulttple-C,holce 

Clo^a Exerciaas In the upper left-hand triangular se&tlQn of the matrlKi 

the ^-ItOT subscores for the Wh-item test are In the lOTar-rt^t trlatiglei 

and the Intercorrelatlons of the subscores of the tm tests are presented 

in the Lower-Left rectangular section of the tahle* The correUtloTis of 

the subsoores of both tests TOth the total test icoKes are glveii in ro^s 

5 aiid 14 of the matslx. The correlations of all scores ^th the test- 

mseness score are given In the last row of the table. These data support 

the following observations In the Level 1 smples 

I, The delacion type sutseores for the miilttple-chQice etoze 
test foffiis are moderately to hi^ly Intercorrelated, except 
for the adverb score which tends to have low correlations 
with the other sub scores. 

2« The tiouri score correlates highest with all other deletion 
type sub scores and with the total test* (Ths aorrelation 
of nouni with the total test is particularly hd^^ but this 
is due to the fact that the test is largely coniposed of noun 
responses at Level l)# 

3, Although the ^di^sub&cores are each based on only a fe^ it^i 
(the average equals 3.5)^ these subseores are consistently 
and moderately Interrelated. 

4. The ^-itan subsoores generally correlate at about the same 
level with the total tto-test scora. 

5* The correlations of noun or verb scores with the i^h-iCein 
subicores are generally as high ai the interoorrelat ions 
of the ^-itffln subicores. 

6, The correlation between the total scorei on both; the 
multiple-choice clo^e and vAi--itCTi tests Is .81^ a level 
at which either test Is fairly predictable from the other. 
(If these tests were two noOT-refereneed measures of 
reading eomprehanslcins this correlation would he interpreted 
as a ve^ acceptable validity coefficient.) 

7. The test«wiieness score has a significant but very low 
correlation with the other test scores. 
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1 



The fortgoing observations hold for the otheK levels of the multiple- 
choice eloae and wh«lt«n tests given in Tables 9.4 and 9,5, with a few 
important qtAallf iGatloTia. The corEalatlons of adjectlva and adverb scorai 
with subtest scoree aad total test momB on the multiple-Gh^ice clo^e 
test topro-ve acre as grade i 4-8, due apparently to shangas Ifi the con^oiltlon 
of passages at these mora advanoed reading levels. <Tha mean and vatia^e 
o£ advexb and adjecudve subicores iiwcease, biit the total test is still 
pradOTlnantly noun and varbO The correlations between iub scores and total 
icoras on the wh-ltm and mult Iple-chQica cl&ze testi tend to decrease 
acTOiS grade lavels, with this daciease be lug most mariced In the grade 8 
ismpie <r between total test scores Is **56^ down from in the Level I 

sub saap le) • 

These data lead to the conciusion that the two types literal 
eomprahsnelon teats assOTbled for the ^search are generally conslitent " 
within th^aelvas across the study ^mple. The pattern of the Intercorrala- 
tlons v?lthln and betwaen test types ti further coniiitent with the 
Intarpratation that the subscores in either test contribute to a single 
factor and this factor la COTranon to both tests« However, the comionallty 
that appears evident betwaen the two tests Is substantially reduced In the 
upper grade sanplaip an effect that ti apparently attributable to tlie shapes 
of the scoTO distributions at theie level i arid reduetton in range of talant. 

The other factor of test comparability c£ interest here raay be evaluated 
in part by reference to Table 9*6, ifelch pre sent s the means and standard 
deviatloni for each test fom in the ^-Itaa and multiple-choice clo^e 
fomats. In addltloa to these data, the proportions of correct re^oftsas 
for each iCm In each test fom were arrayed and visually inspected as 
were the raw score distributions on each test £om. Inspections of the 
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Table 9^5 



Means 



Standard De^iitflaiis for the-MiAti^le-Cli^iG^ Ci4^<{^ Tfhatiia ' rest s " 



X 



U 2, 3) 



Fottn 








1 


i2F 




10.55 


2 


126 


20,26 


10,49 


3 


130 


21.51 


10.59 


4 


124 


22.44 


11,34 


5 


126' 


23.06 


11.24 


6 


126 


19.71 


'9,24 


7 


127 


21.47 


11,22 


8 


127 


18.84 


10*40 


.9 


129 


■21.98 


11,31 


10 


127 


20.47 


10.43 


U 


120 


23.39 


11,25 


12 


123 


22.67 


11.41 



37 
38 
39 
40 
41 
42 
43 
44 
45 
46 
47 
48 



r27TaT85 '77X5 

124 19.52 7.32 

126 19.57 7.29 

121 19.02 7.30 
119 18.43 7.72 

122 19.17 7.00 
124 19.05 7.28 
124 19.20 7.53 
131 19.10 7.69 

123 19.19 7.47 
121 18.65 7.18 
121 20.12 7.69 



II CGrades 
4. 5, 6) 



13 


147 


41. 46 


11.45 


14 


151 


40.01 


14.11 


15 


153 


38.73 


12.51 


16 


152 


40.99 


11.62 


17 


146 


42.18 


12.60 


18 


151 


36.35 


11.03 


19 


152 


41. 80 


13.48 


20 


148 


42.00 


12.08 


21 


152 


41.39 


11.37 


22 


152 


39.63 


13.57 


23 


148 


41.7.2 


12.99 


24 


149 


39.01 


13.32 



49 


147 


22,74 


5.57 


50 


153 


21.95 


5.46 


51 


148 


22.72 


4.85 


52 


152 


22.74 


5.66 


53 


145 


23,52 


5.46 


54 


144 


23.19 


5.45 


55 


145 


22.96 


5.00 


56 


149 


22.60 


5.96 


57 


147 


20.76 


6.11 


58 


148 


22. 19 


5.86 


59 


157 


23.76 


5.51 


60 


145 


21.87 


5.73 



111 V'Srades 
7, 8, 9) 



25 


167 


36.60 


12.53 


26 


164 


36. 44 


11.69 


27 


160 


38.86 


14.33 


28 


161 


40.47 


12.82 


29 


158 


39.17 


11.52 


30 


165 


42.54 


13.35 


31 


158 


39.46 


12.45 


32 


1.63 


37.07 


12.01 


33 


166 


37.38 


11.98 


34 


159 


38.08 


13.60 


35 


153 


37.82 


13.18 


36 


165 


41.82 


12.56 



61 


163 


23. 


.61 


5.54 


62 


162 


23, 


,89 


7.01 


63 


164 


24, 


.25 


5.79 


64 


161 


23, 


.89 


4.83 


65 


165 


23, 


.5 3 


4.75 


66 


166 


21 


.20 


6.20 


67 


154 


24 


,88 


4.85 


68 


163 


22 


.40 


5.65 


69 


164 


24 


.02 


4.99 


7a 


156 


22 


.01 


5,31 


7L 


163 


23 


.16 


5.94 


72 


154 


22 


.03 


6.90 
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means and standard deviations of each- set. of test forma shows thac the 

dtffeMnce bettreen any pais of test-foOTi;> maans Is relatively SQall <e.g», 
ieor the Leval 1 multiple-choice cloze test foms, the largest dt££mtmme 
batween means ts less than oM-thtrf of the average sttandaKd devlatloni 
for the Level 1 wh-iceoi test, the average difference between mean f Is less 
than one-£ouTtth of the average standacd deviation), these data, taken 
together , vlth the exanlnatloa of the scoce dlstrlhutlons on each wh-ltan 
and ittultttple-choice cloze test fonn. Indicate that the various foms of 
each test type In a test Level wece "fair I7 conip arable, Glveii the other 
jKopertles of each tast form (e.g,', progEiSslve ordering of passages by 
readability level), It then appeals that the various test foms of each 
type in a le\rel TOuld result in fairly coi^arable scaling of scudeiits at 
each level of the vdi-itein and niiiltlple-choica cloze tests. 
Test Reliab 11 It^ 

The reltabllity of a test is of Interest because it estinates the 
aniount of random error contalnea In a test score. The validity coefficients 
reported hare for che wh-lt«ii, multiple-choice oloze, and test-wlseness ' 
tests cannot be Inte^jreted satlsf octoclly without an appropriate estimate 
of measurattent error. The reliability statistic selected for this purpose 
iras the Kuder-Slchardson Eomula 20 or KB-20 CKuder and Mshardson, 1937), 
lAich is a special case of the Hoyt (1941) or Cronbach (19S15 coefficients 
of equlvaleticB when test Ittans are scoEod dlchotomously. As used here, 
Ka-ZO reflects directly on several other properties of the cloze and 
wh-ltffln tests. The foonula pcovldes an estinate of the homogeneity of the 
Items in the test or the proportion of test varlanca attributable to the 
first seneral factor In the test. In addition, the KR.20 foanula provides 
a good estimate of Che shojet-te«o stability of the test. And, since the 
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KR-20 is available here on a large nmber of test foms systeraattealLy 
assambled firan the TDN, the average or median value of the KR-ZQ-^^is also 
a basis for judging the reliability of the proGess of test gemratloru 

Asaorflngly, the KR-20 also reflects on the validity of the t^-ttan 
and inultlpla-.cholQe atom tests. The deseriptlon of the construct indloates 
the test Is a measure of a homogeneous trait that ^ould be stable over 
short periods of time* Inspection of the KH-20 coeff lalents In Table 9.7 
for the multlple<*choioe clogG and wh«ltTO tests indleates that rellab tMty 
expectations for these tests we^e conflmed at a very high level. 
median KR*20 value for the 12 clo^e test foims at each test level wai 
*96| for the wh^iteTn test, the ^'20 ranged from #91 to *93# These 
data provide fuMhax support for the conclusion that each of the two 
measures of literal oomprehenslon is a highly reliable estimate o£ a single^ 
homogeneous trait* The tests assailed to measure the trait should thus 
scale individuals similarly over ^ort periods of timep and it appears that 
similar tests fur measuring this trait can be repeatedly assfflibled# 

A final point of comment concerns the reliability of the test-^ijlsenass 
test forms assembled for each test level* Across all test level 4 of 
these test forms had low rellabilitlea (Eoms 40, 4S, 470 -aiid 53)^ but given 
the shortness of these teat forms and their unusual composition^ the KR-20 's 
are surprisingly high* These resulte are similar to Tuiman^s (1973) for 
a similar test he referred to as a measure of. passage Indepandance, 

Rasch Analysis 

As es^lained in ^apter VII, complete analyses using the Rasch ineasure- 

^Thls inference is supported directly and Indirectly by the data 
reported here and by the consistency of the results reported In the next 
section of this report* 



306 

9-21 



labia 9.7 



KuditaichardBon Formula 20 Reliability Coafficlenti for the 
MultipU^Choics ClDzei and Teit-Wiseneis Tests 



Multiple -Choice 
cloze Test 



Wi-Item 
Test 



Test^Wiseness 
Measure 



Form N 



BE 



1 


128 


41 


.94 


1.73 


2 


126 


41 


.95 


1.64 


3 


130 


41 


.96 


1.46 


A 


124 ■ 


41 


.96 


1.46 


5 


126 


41 


,95 


1.73 


6 


126 


39 


.95 


1.57 


7 ■ 


127 


41 


.96 


1.45 


8 


127 


39 


.96 


1.51 


9 


129 


41 


.97 


1.33 


10 


127 


41 


.96 


1.49 


11 


120 


41 


.96 


1.43 


12 


123 


41 


.96 


1.54 


Median 




,96 


1.49 



13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 



147 
152 
153 
152 
146 
151 
152 
148 
152 
152 
148 
149 



60 
60 
60 
60 
60 
60 
60 
60 
60 
60 
60 
60 



.97 
.96 
.95 
.96 
.97 
.94 
.97 
.95 
.95 
.97 
.97 
.95 



1.98 
Z.82 
2.50 
2.32 
2.18 
2.69 
2.33 
2.69 
2.53 
2.35 
2.25 
2.97 



rortn N 



49 
50 
51 
52 
53 
54 
55 
56 
57 
56 
59 
60 



I KraO SE 



37 


127 


30 


.92 


2,02 


3B 


124 


30 


.94 


1.79 


39 


126 


30 


.90 


2.30 


40 


121 


30 


.90 


2.31 


i*l 


119 


30 


.91 


2.32 


42 


122 


30 


,91 


2,10 


43 


124 


30 


.93 


1.92 


44 


124 


30 


.90 


2.3S 


45 


131 


30 


.90 


2,43 


46 


123 


30 


.91 


2.24 


47 


121 


30 


.94 


1.76 


48 


121 


30 


.92 


2.18 



147 
153 
148 
152 
145 
144 
145 
149 
147 
148 
157 
147 



30 
30 
30 
30 
30 
30 
30 
30 
30 
30 
30 
30 



,91 2.21 



.93 
.93 
.90 
.86 
.93 
.92 
.85 
.95 
.94 
.91 
,94 
,93 



1.47 
1,44 
1.53 
2.11 
1.44 
1,54 
1.94 
1.33 
1,50 
1.76 
1.35 
1.51 



Form L 



49 
50 
51 

52 

53 

54 

55, 

56 

57 

58 

59 

60 



147 
152 
148 
152 
145 
144 
..145 
149 
147 
148 
157 
145 



12 
12 
12 
12 
12 
12 
12 
12 
12 
12 
12 
12 



KR«2Q 



37 


127 


12 


.76 


38 


124 


12 


.68 


39 


126 


12 


,68 


40 


121 


12 


.47 


41 


119 


12 


.67 


42 


122 


12 


.79 


43 


124 


12 


,68 


44 


124 


12 


.51 


45 


131 


12 


.26 


46 


123 


12 


.78 


47 


121 


12 


.13 


48 


121 


12 


.76 



.68 



.50 

,75 

.68 

.74 

.29 

.76 

.46. 

.73 

.71 

.58 

.66 

.76 



Median 



,96 2.3S 



.93 



.70 
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Table 9.7 dContlnuedJ 



MuLtlple-ChoiCB 
ClQge Test 



Wh«ltflra Teifc-Wlssneis 
Itit . Measura 



M I SB g""' N I KR^ JE J om N 1 K^^O 



25 167 

26 164 

27 160 

28 161 

29 158 

30 165 

31 1S8 
.32 163 

33 166 

34 159 

35 163 

36 165 

Median 



60 


.96 


2.51 


61 


163 


30 


.91 


60 


.95 


2.61 


62 


162 


'30 


.94 


60 


.96 


2.87 


63 


164 


30 


.96 


60 


.97 


2,22 


64 


161 


30 


.89 


60 


.95 


2.30 


65 


163 


30 


.89 


60 


.97 


2.31 


66 


166 


30 


,94 


60 


.95 


2.78 


67 


154 


30 


.96 


60 


.96 


2.40 


68 


163 


30 


.90 


60 


.95 


2.67 


69 


164 


30 


.96 


60 


.95 


3.03 


70 


136 


30 


,93 


60 


.97 


2.28 


71 


163 


30 


.95 


60 


.97 


2.17 


72 


154 


30 


,95 




.96 


2.40 








.94 



1.66 
1.71 
1,16 
1.60 
1.57 
1.52 
0.97 
1.78 
0.99 
1.40 
1.32 
1,56 



61 


163 


12 


.55 


62 


162 


12 


.70 


63 


164 


12 


,77 


64 


161 


12 


.78 


65 


165 


12 


.59 


66 


166 


12 


.77 


67 


154 


12 


.75 


68 


163 


12 


.79 


69 


164 


12 


.77 


70 


156 


12 


.70 


71 


163 


12 


.75 


72 


154 


12 


.72 



.75 



Median . ' .96 "'^^ 
Mean .96 '92 ' 

Range .94..97 .85^.96 .13-.79 



number of lubjeeti* 
nunibar^ of itams* 



ment model ^eee ealculat©d an all forms of che aultlple-cheice cloze and 

wh-item tests. These «tia:5ySes provide evidence for answering the following 

questions,.? Do the multi|»lft«*ohoice cloze atid wh-itffltt tests measure one 

trait? Is the measurOTiemt 0$ this trait consistent across grade levels? 

As noted previously, bo tit wsts were designed to measure one trait, literal 

COTttprehenslon. The Rasch, analysis provides a ffurther test of this 

assumption as well as adaittional evidence on the general izablllty of the 

cloze item form to levels ol written discourse* 

The Rasch modal specifies a parclcular simple relationship 
between person ability, item dlffiloultyj, and the probahility 
of observrlng a aoCTeot response. The dmpllcations of this 
specification ttW thati 

1) the varlefei© meaiured Is unid toienslonal i 

2) there are no strong relationdiips Wiong the persons or 
ItOTS othe* «hian those specified by the model so that 
responses off persons to items are stochastically 
independenc ftven their parameters in the model| 

3) Itans and p««ons do not differ sub stantlally with 
respect to Otlier possible response. factors not repre- 
sented In «ha model such as it«ji discrimination, person 
sensltlvltyii guesalng or indifference- Cwrl^e and 
Mead, 1975, ;p* a). ^ - 

If the data analyzed for ViM present study fie the Rasch measuraaent model, 

then these three condltiflni of the model roust have been satisfied in the 

available response data#' Wore specifically. If the data on the multlple- 

eholce cloze and wh-it« mma fit the Msoh measutanent model, then it can 

be concluded that the vftrtafele being measured by each test Is unldJmensional. 

Table 9,8 displays tiha mean and standard deviation of the fit mean 

square statistics for all itm&a in each fom of the multiple-choice cloze 

test. This average fit mm square Is calculated from each Item fit mean 

square, which is the appTOpriate statistic for testing the fit of each item 

to the Rasch model. Thea^ mean fit mean square statistics have expected 
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Table 9.8 



Umn Standard Devtatioii of the Fit Mean Square Statistics for 
Each Fom of the Multlple-Ghoitfe Close Tiit 



Form 


N 




X 




1 ,60 






1 .^2 






1*7^ 


/, 




2*00 


J 




1,^6 


- O 




1*56 


7 




1*97 


Q 

Q 




1.^3 


Q 

y 






10 




2*09 


1 1 


ny 


1 Q1 




• & 1 


1 QP 


13 






H ji 
14 


^ C D 


1 74 
1 * 


15 


152 




1o 


ip 1 




17 


TM-o 




1o 




2 08 


19 




1 * W J 


20 






21 


152 




22 


152 




o% 
dS 

Zk 


1^4-8 
1t"U 


1 .62 


lit? 


1.99 


25 


166 


2.51 


26 


163 


1.98 


27 


160 


2,11 


28 


161 


2.3^ 


29 


158 


1.80 


30 


162 


2.39 


31 


158 


2.60 


32 
33 


163 


2.03 


166 


1.56 


3^ 


156 


1.81 


35 


162 


2.50 


36 


165 


1.81 



Mean of fit S.D of fit 
mean square 



1.69 

0.83 

2.19 

2.59 

0.88 

1.79 

1.82 

i.Oif 

2.88 

2.02 

1 .63* 

2.2B 

2.58 

1.77 

1.67 

1.53 
1.08 

1.9^ 
1.71 

2.95 

1.80 
1.17 
2.01 

3.08 

1.53 
2.0if 
2.07 
1.58 

2.79 
3.20 

1.5? 
1.53 
1.33 

1.61 



^ewOfc«s esspected s D of •35^' All other fonns have expected S D of 

.23,' 
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values of 1.0. The standard deviations of these mean fit mean square 
statistici have expected values of the square root of 2 over the degrees 
of freedom for the nmnber of score groups. For 6 and 5 scores groups, the 
es^eoted standard deviation equals .28 and .33, respectively. 

With reference to Table 9.8, it Is posBlble to detennine the best- and 
warst-f Ittlng test foniw The best-flttlni form will have a mean and standard 
deviation close to the ej^ected values of 1.0 and .35 (Or .28, for 5 
Instead of 6 score groups. The worst-fitting fom will have values furthest 
from these ejected values. Based on these criteria, the best-fitting 
multiple-choice cloze test Is Fom 2j the worst fitting Is Fom 31. There 
are no multiple-choice test forms with statistics that deviate radically 
from the es^ected values. However, test calibrations with rather high (Le., 
above 2,0) mean fit mean squares should be studied in detail to determine the 
cause of misfit. Due to the fact that the forms of the test range from grade 
1 to grade 9, it can be concluded that the trait measured by the multiple- 
choice cloze test is unidimensional and stable across these grades. 

A more detailed analysis of the fit o£ the multiple-choice cloze data 
to the Rasch model is provided in Table 9.9. This analysis is more sensitive 
than the previous analysis because the fit statistics are calculated within 
score groups. These score groups increase in ability from the first to the 
sixth group. Score ranses for the score groups are determined by the program 
80 as to make the N of each group as equal as possible, based upon a prede- 
termlned mlndmtffli group size. 

, 2 

The fit statistics in Table 9.9 are mean and standard deviations of a 

statistics for testing the fit of each score group. Under the assumption 

2 

that the multiple-choice cloze data fit the Rasch model, the mean % statis- 
tics have expected values of 1.0 and standard deviations of 1.4. This anal- 
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Table 9.9 



Means and Standard Deviations of the / Statistics for Testing the Fit 
of Each item in Each Score Group for Multiple^Choice Close Test Foms 



I 

•4 



Form 



I 
2 
3 
4 
5 
6 
1 
8 
9 

10 
U 
12 
13 
14 
15 
16 
17 
IB 
19 
20 
21 
22 
23 
24 



First gtoui 
N I SD 



20 
20 
24 
21 
20 
20 
20 
23 
19 
19 
22 
19 
22 
23 
23 
25 
25 
26 
25 
24 
23 
25 
24 



1.9 
1.2 
2.9 
2.9 
1,4 
U 
2.8 
U6 
2.5 
2,9 
2.1 
2.6 



3.2 
1.6 
6.8 
6.8 
1.4 
2.7 
1,3 
3.2 
3.3 
3.9 
2.3 
3.3 



4,8 12.3 
3.2 6.4 



3.1 
2.5 
l.S 
4.6 
3,8 



4.1 
4.6 
2.5 
8,2 
6.5 



4.5 10.3 
3.4 7.9 
4.0 B,7 

3.6 5.9 



N K SD. 



22 
22 
23 
20 
IS 
23 
23 
18 
22 
19 
24 



2.0 
1.7 
2,3 
2.2 
1.4 
1.1 
2,3 
1,1 
4.5 
2.8 
2.0 

20 2.4 6.5 
23 1.4 1,9 
1.1 
1.6 
1.2 



25 
25 

24 



23 0.8 
27 1.7 



26 
26 



3.8 
2.3 
3.6 
6,4 
2.1 
1.4 
3.2 
1.8 
8.0 
6.2 
3.9 



2,3 
2,1 
1.7 

0. 9 

1. a 

2,7 
2.1 



24 4.0 10.1 



1.7 

1,8 

26 3,4 7,6 

27 1,5 2,1 
23 1.3 1.7 
25 2.0 3,0 



Third group 



23 
20 
23 
19 
21 
22 
21 
21 
22 
24 
24 



26 
24 
25 
25 



1.1 
1,0 
1.5 
5.1 
1.2 
4.6 



1.0 
1.1 

l.l 
2.4 
1.1 
2.0 

1.6 1,9 
1.4 2.0 
3.1 S#l 
1.1 1.6 
1,3 1.8 
22 1.4 1.7 
26 1.2 2,2 

26 1,4 

27 1.3 
24 1.2 

24 O.B 

25 1.0 
1,6 
1.4 
1,3 
0.9 



24 0.9 
21 1.3 



1,6 
1.7 
1.3 
0.9 
1.2 
2.0 
1.7 
1.9 
1.1 
0,9 
1.9 



Fourth a^oup 
N i SD 



20 1,2 
19 0.9 
22, 1.2 

21 1.1 



20 
22 
21 
20 
22 
19 
23 
20 
23 
23 
24 
26 
21 



2,0 
0.9 
1.7 
1.2 
1,7 
1,4 
1.5 
1,3 
1.0 
1,2 
1.7 
1.3 
1,4 



24 1,7 
22 1,2 



22 
23 
27 

27 
26 



1,4 
1.7 
1,0 
1,6 
1.7 



1,4 

1,1 

1,4 

1,3 

3,1 

1,4 

1.8 

1,8 

2.5 

1,0 

1.3 

1.5 

1,1 

1.3 

2.3 

1.7 

2.5 

2,2 

1,5 

1.7 

1,9 

1.4 

1.8 

3.2 



Fifth group 
N X SD 



20 
20 
23 
21 
22 
21 
20 
22 
25 
21 
24 
18 
25 
28 
27 
25 
26 
25 
23 
25 
25 
25 
27 
25 



1.5 

1.3 

U3 

1,4 

1,4 

1.5 

1.7 

1.8 

1,9 

2.1 

2.7 

2.1 

1.7 

1.9 

2.5 

1.8 

1.3 

1.6 

1.3 

1.2 

1.7 

2.2 

1,4 

1,3 



2.2 
1,8 
1,4 
2.2 
1.6 
2.1 
2.0 
2.2 
2.6 
2.9 
4.5 
2.1 
2.6 
3,7 
2.9 
4,1 
1,6 
2,7 
1,6 
1.0 
1,9 
3,1 
2.1 
1,4 



Sixth group 



N X SD 



21 
23 
13 
22 
23 
18 
20 
21 
17 
24 
0 
22 
28 
23 
26 
27 
27 
24 
31 
28 
30 
23 
23 
26 



2.0 

1,7 
1.7 
2,0 
1.4 
2,1 
1.7 
1.5 
1,4 
2.3 

0. 0 
1.8 
2.0 

1. e 

2.7 
1.7 
1.9 
1,9 
1.6 
2.7 
1.9 
1.2 
1.0 
L6 



4.8 
2.5 
3.4 
2.3 
1.6 
4,1 
2.8 
3,5 
3.3 
3.0 
0.0 
3.1 
5,3 
3.4 
5.0 
2.7 
5,0 
3.1 
3.7 
8.9 
3.4 
1.3 
1.3 
2.4 
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Tabii 9,9 (Continusd) 



Student sub^grQups 



I 



Foil, 

25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 



First gro, 
N I SO 



27 
2B 
25 
27 
26 
27 
25 
27 
29 
26 
25 
28 



5.2 1A9 
2.9 6.3 



3,6 

3,5 

2,6 

2.4 

3, 

3.3 

3.8 

2.5 

3,1 

3,3 



7i2 
5.9 
4.0 
3,6 



4,9 
7,4 
3,6 
7,8 
5.7 



Second group 
N i SD 



N X SD 



roup FouEtii gtoup 
N i SB 



Fifth group 
K X SD 



Sixth mm 
N 1 ID 



29 


1.9 
2,3 


3.0 


27 


1,7 


2.5 


23 


2.1 


4.9 


26 1.5 2.0 


34 


2.7 

1 Q 


30 


2.8 


25 


1,4 


1.6 


27 


2.1 


2.6 


25 1.3 1.* 




27 


1,7 


3.3 


25 


1.3 


1.4 


24 


1,8 


2.2 
2.8 


25 1.7 2,7 


34 


2,5 


26 


2.9 
2,0 


3.0 


26 


1.1- 


,1.5 


28 


2.2 


30 2.2 3.1 


24 


2,2 
2.1 


27 


4.9 


25 


1.2 


1.4 


29 


1.9 


2.7 


26. 1,1 3^5 


25 


28 


1.6 


2,2 


29 


1.8 


2.1 


26 


2,0 


2.9 


29 4.7 12,9 


23 


i.a 


27 


3,1 


4.0 


25 


1.2 


2.1 


26 


2.1 


2.6 


23 2.4 2,7 


32 


3.1 
1.9 


29 


1.8 


2.9 


28 


1.4 


1.7 


29 


2.1 


3.2 


29 l.B 2.0 


21 


27 


0.9 


1,4 


27 


1.1 


1.6 


26 


1.0 


1.2 


29 1.3 1.7 


28 


1.3 
1.8 

2,3 
1,4 


26 
27 
25 


2,1 

3.6 
1.7 


3.5 
6.6 

2.0 


28 
26 
30 


1.2 
1.8 
1,3 


1.6 
2,4 
1.3 


28 
27 
31 


1.4 
2.4 
1.5 


i.6 
2.8 
2,1 


25 l.B 2.0 

27 2.0 2,2 

28 1.6 3.0 


23 
30 
23 



3.3 
2.2 
4,2 
6.7 
3.5 
4.3 
5.4 
2.5 
2.0 
3,1 
3,6 
4.0 
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ysis pin-points the location of "misfit" with the Raich model wtthln a par- 
ticular ability group. It oust be emphasi^ad that "misfit" Is relative in 
this and subsequent analyses because It la primarily due to student ability 
and not test items. (In actual practice, when one is calibrating items on a 
test, one removes the students, not the Items, that are causing the 'Ms£it»": 

Again, the best-fitting multiple-choice test is Fonn number 2. The 
greatest deviation from expectation Is found in the high ability group with 
a ntean of 1.7 and a standard deviation of 2.5. (Form 2 had practically a 
normal distribution of scores.) The multiple-choice cloze test with the 
poorest fit Is again FoBn number 31. Form 31, with the ability groups so 
specified, does not fit well In 3 of the 6 groups. In addition, the standard 
deviations of the mean fit statistics are far removed from expected values. 
These results are consistent with the previous analysis. 

The results in T^les 9,8 and 9.'9^ support the coTwluslon that the trait, 
namely literal comprehension, measured by the multlple=cholce cloze test is 
unidlmenslonal and stable from grades 1 to 9. 

k final point should be noted concerning the fit, within ability groups, 
of the multiple-choice cloze test. It is commonly observed in Rasch analyses 
of student test data that "misfit" Is constrained to low and high ability 
groups. In the case of the multiple-choice cloze, the low ability groups 
are causing the most problem. As previously noted. In calibration work, 
several low-ability students would be deleted and the lasch analysis rerun 
on Cha same form. This second run would display a better fit of the data 

to the Rasch' model. 

Table 9.10 presents the mean and standard deviation of the fit mean 
square statistics for all Items in each form of the wh-item test. When 
compared to the same statistics for the multiple-choice clo.e test, the 
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Table 9.10 



Mean and Standard Deviation of the Fit Mean Square Statistics 
For Each Fom of the ^-Itan Test 



Mean of fit S D of fit 



Form 


N 


mean square 


mean square 


37 


126 


2.06 


2, 09 


38 


121 


1.74 


1 , Ji 


39 


122 


1.94 


1* 26 


40 


120 


2. 15 


1. 88 


l-^l 


114 


2.38 


« fi ^a 
2 , 25 


42 


120 


2.34 


2.39 


43 


123 


1,84 


1, 26 


44 


123 


1.85 


li 85 


45 


129 


2.27 


2. 12 


46 


123 


4.96 


13,23 


47 


118 


2.19 


2, 22^ 


48 


106 


1.42 


.89^ 


49 


142 


1.36 


1*07 


50 


149 


1.61 


1.30 


51 


147 


1.79 


2.31 


52 


148 


1.73 


1*58 


53 


141 


1.46 


1*15 


54 


137 


2,12 


1*98 


55 


142 


1.49 


1,44 


56 


137 


1.30 


1,19 


57 


140 


1.10 


0*69 


58 


144 


1.63 


1.21 


59 


136 


1.56 


1*17 


60 


142 


1,46 


1. 83 


61 


163 


1.69 


3.03 


62 


161 


4,72 


18.06 


63 


153 


2.14 


2.46 


64 


154 


1.56 


0.97 


65 


165 


1.42 


1.79 


66 


161 


1.59 


1.96 


67 


145 


1.47 


1.49 


68 


160 


1.62 


1,27 


69 


160 


1.49 


1.08 


70 


146 


1.19 


0.83 


71 


162 


3.05 


10.86 


72 


145 


1.38 


1.44 



^Denotes expected S D of ,35, All other fonns have expected S D of 
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values for the wh-item Cest are closer to es^ected values. This finding 
was eJ^ected because more stringent passage controls were used in the de- 
velopment of the ^.-ItOT test than the multlple'-cholGe cloze test. This 
consistency is reflected in all of the values reported In Table 9.10* Note 
that the worst-fitting forms are 46 and 62» 

A more detailed fit analysis of the vAi^ltem is provided in Table 9,11» 
Again the worst-f Itting wh^-ltsn tests are Foms 46 and 62. Other than these 
extr^.e foms^ there is a very consistent pattern of effects in Table 9.11. 
The low ability group seans to account for nearly all of the extremes in 
misfit. Put another way. If some of the low ability students In fehese anal- 
yses were ronoved and the wh-ltem test data recalibrated using the Rasch 
model, the fit of the data to the model would be even more consistent. Gen- 
erally* from these results it can be concluded that the trait measured by 
the wh-item test, ^ich Is also hypothesised to be literal comprehension, is 
also unidimensional and stable from grades 1 through 9. 

The conclusions drawn from the Rasch analyses support the conclusions 
drawn in the pirevlous section on test comparability and reliability. Gen- 
erally, the items on the various ^«lt©n and cloiie test forms contribute to 
the measuranent of a single, homogeneous trait* 
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Table 9,U 



Means mi Standaid D«l»tioi>s of th« / Statistics fot ftstlng 
"rkt of Each Ite. in Each Score Cioup £ot 1-Itei I«t For.i 



sacoji^ ai^ Mth^ ate miBL 



Form 


N 


X 


SD 


N 


K SD__ 


37 


21 


2.3 


3.2 


21 


3.1 6.3 


38 


21 


2.5 


2.7 


24 


2.2 4,5 


39 
40 


1§ 
21 


3.2 
5.7 


5,5 

Oil 


19 
19 


2,4 3.7 
2,0 2.9 


41 
42 
43 


22 


4,6 




22 


1.5 2.4 


20 
20 


JiJ 


t Q 

5*0 


19 
21 


3.6 10.9 
1.5 2,1 


44 
45 


21 
22 


4.6 
4,5 


8i8 


20 
22 


1.1 1.3 
2.9 7.3 


46 


20 


6.0 


lO.l 


If 15.7 66,8 


47 


LI 


0 1 




25 


2.7 5.6 


48 


21 


1.7 


2.3 


20 


1,6 2.5 


49 
50 
51 


21 
26 

26 


1.5 

1.4 

3.7 


2.0 

1.6 ■ 
6.7 


25 

21 
20 


1.1 1.3 
1.0 1.2 
0,8 1.0 


52 


27 


3.6 


6.5 


26 


1.3 1.6 


53 


24 


2.1 


3.6 


22 


0.7 0,8 


54 


22 


4.3 


7.7 


24 


1.3 1.9 


55 


24 


2.2 


3.6 


26 


1.1 1.3 


56 


24 


2.7 


4.S 


25 


0,9 1.1 


57 


22 


1.2 


L3 


24 


0.8 1,0 


58 


21 


3.0 


4.4 


25 


1.2 1.6 


59 
60 


21 
23 


2,0 
3,3 


3.0 
8.5 


24 
25 


1.2 1.6 
0,8 1,0 



21 U 1.5 21 l.S 

22 1,8 2.0 24 1.3 

20 2.6 2.5 23 1.0 

21 1.6 1.5 20 1.3 

22 3.0 3.1 24 1,5 

22 2.2 2.5 20 1.8 

f ^ z r: n l5 "2.1 21 1.? 1.6 n 1.6 

18 1.8 2,0 17 1.0 

25 2,1 2,3 18 m 

18 2.0 2.4 24 2.6 
24 1,8 2,9 23 1.5 

23 1.1 1.3 22 1,5 

19 0.8 0,7 26 1,6 

26 1,3 1.7 . 27 1.7 

27 1.3 1.6 30 1.3 
27 1.6 1.7 24 1.7 
26 1.3 1.4 16 1*6 

22 1.5 1.7 16 1.0 
15 1.1 1.5 18 1.1 
26 1,4 1.5 25 0,8 

23 l.O 1.3 27 0.8 
19 1.2 1.7 19 1.3 
26 1.4 2.0 22 1.5 
26 0.9 1.1 22 1.0 



SD 


N 


1 


SD 


N 


X 


SD 


1,7 


24 


1.8 


2.1 


18 


i.a 


3.4 


2.1 

1.1 


23 
17 


2,0 
1.2 


3,5 

2.0 


7 

24 


0.5 
1.3 


1,1 
2,2 


1.3 


20 


1,1 


1.3 


19 


1,3 


n "7 
it 1 


1.6 


24 


1,4 


2.0 


0 


0.0 


0,0 


2.0 


22 


1.3 


1.5 


17 


1.1 


2,1 

1.8 


1.4 


20 


1.3 


1.3 


24 


1,5 


1.1 


24 


1,2 


1.4 


23 


1.4 


2,4 


1.7 


19 


1,8 


2.5 


23 


1.3 


1.7 


2.7 


19 


2.0 


2,0 


23 


1.6 


3.4 
0,0 


1.4 


24 


2,9 


5,8 


0 


0,0 


1.6 


20 


1.2 


1.5 


0 


0,0 


U.U 


2,3 


30 


1.4 


3.0 


21 


1.8 


3.3 


2.1 
1.7 


22 


1,0 


1.2 


27 


3.3 


5.2 


22 


2.1 


4,7 


22 


1,5 


3.7 


1.3 


29 


1,6 


2.6 


15 


0.6 


1.2 


3.0 


18 


1.0 


1,3 


35 


2,1 


2,9 
6,3 


0.9 


30 


2.1 


2,3 


23 


2.4' 


1.2 

0.8 


23 


1,0 


1.2 


36 


2.3 


6,5 


27 


0.7 


0,9 


10 


1.2 


2,3 


1,0 
2.0 


20 


1,1 


1.3 


24 


l.B 


3.2 
1.6 


:23 


1,5 


2.3 


37 


1.6 


2.2 


26 


2,2 


3,4 


17 


1.2 


1,5 


l.l 


24 


1.2 


1.4 


22 


1,7 


3.4 
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Table 9.11 (Contlnusd) 



Studant sub-group 



61 

62 
63 
64 
65 
66 
61 
68 

f 69 

, ; 

72 



if I 



27 3.0 7.6 

26 20,4 9i.2 
25 3.7 8.a 

2.4 3.2 

2.5 8.5 
2,2 3.3 
2.0 2*9 
hi 4,7 
2.5 5.4 
1,8 3.0 

27 12.3 62.4 
25 1.9 3.0 



22 
32 
27 
28 
25 
28 
25 



Second group 



Fourth iroup 







iD 


N 


1 


SO 


N 


I 


W 


29 


2,3 


6.7 


33 


i,e 


2.8 


23 




Oi/ 


25 


2,8 


a.e 


22 


1,3 


1,3 


1 n 

19 






25 


L5 


1,9 


29 


1,3 


l.i 








2B 


1,0 


liO 


23 


1.5 


1.9 


25 


1.9 


2.7 


19 


1.0 


1,5 


27 


1,0 


1,5 


15 


0,6 


0.6 


26 


1,3 


l.i 


31 


1.2 


1.7 


22 


1.6 


2.8 


22 


0,6 


1.1 


19 


0.7 


0.7 


27 


1.4 


3.0 


26 


1,3 


1.4 


25 


1.1 


1.2 


30 


1.0 


1.2 


24 


1,6 


2.3 


32 


1.6 


2,2 


16 


1,0 


1.1 


19 
32 


1,1 


1.3 


19 


1,4 


1.2 


26 


0,9 


2.0 


1.2 


1.7 


33 


0.9 


1,0 


22 


0.8 


1.0 


26 


1,2 


1.5 


24 


1.1 


1.6 


29 


1,7 


2.1 



Flftli group 
N i ID 



31 

20 
22 
21 
27 



1.7 2.4 

1.0 1.4 

1.1 1.2 

1.2 2.0 
1.4 1.8 

26 1.6 2.8 

31 2,0 3.2 

30 l.B 2,4 

22 0.9 1.1 

16 0.7 0.7 

17 1.6 4.2 
24 1.2 1.9 



i 1 SD 



20 
49 
20 
33 
45 
29 
18 
24 
38 
41 
31 
17 



0,4 1,3 

1.9 5.0 

2.6 9.0 

1.4 1.7 

2.1 2.9 



1.7 

2,0 

i.a 

1.3 
1.3 



514 
5,2 
2.1 
1.6 
2,0 



1,6 2.5 
1.2 3.8 
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OonstCTJCt Validation 
The principal focus of conitruct validation as discussed In Ghapter 
VI, is the explication o£ the network of interrelated Goncepts that defina 
the trait in question and the conditions and inte^retations that surround 
its measuremant. One accepted approach to construct validation is to 
e^OTine the convergenCB and divergence between the principal measure in 
question and other measures that are indicators of the sme construct or 
can be discriminated from the construct. Another approach is to exmlne 
the convergence of two alternate measures of the sane construct, obtained 
by dIsBlinilar methods^ with other measures that are indicators of the sme 
construct or ^an be dlscrlmitiated from the construct (Gronbach, 197l)« 

In the present analysis, the first approach consisted of analy^iing 
expected levels of correlattori between multiple-choice cloi^e scores and 
various other test scores availrtle In the study--the wh-ltem test, the 

skill scores and subtest scores, language and non-language IQ test 
scores, and test-wlseness scores. In the second approach, the GOnvergence 
of the multiple-choice clo^e and wh-item tests as measures of literal com- 
prehension was evaluated In a simultaneous caparison of the values of 
the correlations of both of these tests with the other test criteria 
available In the study. 

In the first approach, convergent validity Is evaluated by the general 
Gonslstancy vyith which predictions are confirmed In terns of relative levels 
of correlation between the multiple-choice cloze test and other measures 
in the study considered to be similar indicators of the constTOCt of literal 
coTi^rahenslon or related or unrelated indicators of other constructs. Some 
measures are expected to correlate relatively highly with the multiple- 
choice cloae test, others relatively moderately, and still others are 
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predicted to have a low correlation* In the second approach, convergent 

validity Is detemlned by reference to the size of the differences in the 
absolute values of the correlations of the tnultlple-cholce cloze and wh*ltem 
tests with the other test criteria available in the study. These absolute 
differences are expected to be anall, consistent from measure to measure, 
and generaliaable across levels of the study population. 

In general, the predictions indicate that the multiple-choice cloze 
test is relatively highly related to the wh-ltem test, unrelated to the 
measure of test-wiseness, moderately related to the measure of language IQ, 
and related In varying degrees to the CAT, depending on the "apparent 
meaning" of a skill or subtest score denoted by the test label. Both the 
labels and specific item content of these CAT skill and subtest scores 
present some difficulties in interpretation, inevitably leading to anibigulty 
concerning whether a given test is a similar measure of the construct or 
has some other relationship with literal comprehension. TJhere these anbigu- 
itles arise in the present analysis, dlsconf Irmlng predictions, an attempt is 
made to resolve the problem by exanlnatlon of the test Items in question. Such 
an analysis, however. Is potentially fraught with the usual problaiiB of all 
post hoc analyses. That Is, some esqplanatlon of the dlsconfiming event can 
usually be found, and for this reason, a post hoc analysis must be taken as 
exploratory or hypothetical, 

Converaent and Divergent Validity of the Multlple -Gholce Cloze 

Xt was assianed that predictions about the correlations between the 
multiple-choice cloae test and the other test criteria available In the 
study could be made on the basis of the construct definition of literal 
comprehension and the information at hand defining the content of the test 
criteria. In order to understand the rationale for these predictions, it 
Is instructive to review briefly the construct definition and the availability 
of infonnatlon that constitutes an adequate definition of each o£ the 
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various potential indloators or non*. indicators of it. 

The constmcti the multiple.^cholee cloze test . The construct, literal 
comprehanslon, and Its relationship with the multlple-ehoiGe close test 
are stated in detail in Chapter IV of this proposal. In brief , literal 
cpmprehenslon is the apprehension of "the grMmatical and s©nantic relations 
which obtain within and mong the sentences of the discourse" (Katz and Fodo 
1967^ p. 172). The multiple-choice alom test accesses these gramnatlcal 
and semantic relations by systematically deleting nouns, verbs^ and modifier 
from a segment of written discourse, and then placing the deleted words In 
sets of responses where the distractors are all grOTraatically plausible 
but senantlcally implausible. It Is hypothesised that students will have 
no difficulty in selecting the only word which Is gramnatlcally and 
smantically plausible If they can apprehend "the grmmatlcal and semantic 
relations which obtain within and mong the sentences of the discourse. " 
The dlstractors in other words, do not function as traditional dlstractors-- 
do not, in fact, "distract"— until the syntactic and semantic complexity 
of the discourse eKceeds the students' psycho linguistic competence. The 
test Is designed, therefore, to discriminate between a specifiable set of 
interactions—called literal comprehension-«between student and text, and 
another specifiable set of interactions between student and text called 
no comprehension. The test is designed, that Is, to measure literal com- 
prehension or no comprehension and nothing else. The interactions between 
student and test— the extenslveness of the processing of the graranatical 
and semantic relations in the text— are carefully controlled by the type 
and rate of deletion and the dlstractor selection procedure. The item type 
is hypothesised to access only literal meaning, it Aould access no nuances 



Only nouns and verbs are deleted in grade 1 and 2 materials. 
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of meaning and no other semantic Interrelationships than those clearly 

signaled in the graffmnatieal and semantic relations of the teKt» 

The construeti the ^-item test . The »*-ltan test Is designed to 

access the sane grroinatlcal and SOTiantlo relations of a given text. Like 

the multiple-choice cloze, the wh-ltan test Is considered an indicator of 

the construct, literal comprehension. The tAi-itan accesses the grfflnnatleal 

and semantic relations of a text by deleting Immediate constituents In 

clauses of the sentences in the text, replacing than with the appropriate 

wh-words (\^o, ^at, idilch, where, ^en, how, or why), and then transfonnlng 

2 

the clauses into questions. 

The wh-itan, then, is the traditional question type teachers use to 

» 

direct student attention to salient features of the text^ and correct answers 
to such questions are usually considered evidence of literal comprehension. 
The primary difference between such traditional questions and the wh-item 
test is the syst^atlc way in which the ^-itM is written. Such systematiza 
tion makes it possible to specify and control^ to a greater degree than 
possible with traditional test questions, the interactions between the 
features of the text and the psychollnguistic competence of the student. 
Since the wh-itm test has some clato to a specifiable relationship 
with the construct definition, it is considered the preferred or least 
ambiguous criterion measure for the multiple-choice cloze in the analysts 
that follows. The primary difficulty with using the wh-itOT as a criterion 
measure is that it is subject to a form of test-wiseness discussed In 
Chapter 11. In brief , It Is possible for a student with minimal syntactic 
COTipetence to locate the correct answers to wh-ltems In the text without 



"The ^-it^ test is described in more detail In AppendiK A. 

O 
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understanding %^at the question or the text means, Howevar, as will be 
seen throu^out the course of the analyses, there Is no reason to suspect 
that the ^-ItCTi test was aetually subject to this fom of test-wiseness 
in the study sanples. 

The constructs bthar criterion measures ^ The test-wiseness measure 
used in this analysis was reviewed in the Introduction to this chapter and 
Is not to be confused with the fom of test-wlseness just discussed. The 
test-wieeness measure used in the analysis is a preltoiinary effort to 
detemlne the passage dependence of the wh«itOTS, Passage dependency is 
also crucial to the constmctp literal con^rehension. That Is, the test 
itans must access only the gramnatical and sraartic relations which obtain 
within and mong the sentences of the discourse. 

The language and non-language IQ scores reported in the following 
analysis COTie from the Short Fom Test of AcadCTilc Aptitude* In the develop- 
ment of the construct through Chapters I to IVj it was hypothesized that 
language IQ scores should only correlate moderately with scores from tests 
of literal comprehension, since the literal level of comprehension requires 
little of the inferential and related reasoning processes so characteristic 
of measures of verbal Intelligence. Non-language IQ scores would sean to 
have little or no relation to the constmct of literal con^rehenslon and 
should thus correlate to a lesser degree with the multiple-choice close 

than language IQ. 

The problems that exist in specifying the relationship between the 
construct definition of literal COTprehension and each of the skill and 
sub scores of the GAT have already been noted. Given the lack of an e^^liclt 
statonent defining the p sycho linguistic meaning of each GAT skill and subscore 
used in the analysis, it was necessary to define the relationship of these 
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ikills with the conitruct by recourse to the test seore label i and the 
meager skill or subscore daicriptions given in the Test CQordlnator^ s 
Handbook (1970)# 

Cat skill ^acoresi predicted correlational levels . The predicted 
oorralational levels between multiple-choiee cloge scores on the one hand, 
and the wh-itOT test, skills, at^ language and non-language IQ score i 

on the other, are presented in Table 9*12. Theie predictions are based 
upon the relative degree to which it is es^ected the different measures 
will converge upon or diverge from the construct, literal c^reheniion. 
The predictions for the GAT scores have been based on the labels or brief 
descriptions attached to a skill score or sub score, and the consistency 
of the application of the labels is asstaneds 

Table 9.12 

Expected. Levels of ..Correlation .of MHltlple-pholcei Gloze Stores wlth^ 
^ m^Itan Tist^ Scores, '^Calltornla 

Test-I^lseness Scores, and Language and Non-Language IQ Scores 



Lowest (.00^.29) Medial (.30^.54) Hi^est (.55+) 

Test-Wiseness Language Usaie Wh«ltem Test Scores 

Language Mechanics Language IQ Reading Vocabulary 

Non-^Language IQ Reading Comprehension 



The hi^est predicted levels of correlation as evidenced by Table 9,12, 
are between the multiple-choice cloze scores and the ^-Itan test scores^ 
the cm: reading vocabula:^, aid the CAT comprehension scores. The crucial 
correlation Is^ of course, between the multiple-choice clo^e and the wh-item 
test, the preferred criterion measure. The tw e^qperimental tests must 
correlate hl^ly with each other. A strong prediction is also made about 
the correlations between the multlple^cholce cloze scores aid scores from 
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the teit^wiaentsi measure, for reasons already oitede The ranaining 
predictions are less strong because the relationship betwean the cQnetruct 
and the rmainlng measures is not so Glear. Language mechanics and non- 
verbal IQ scores are as^ected to have low correlations with multiple-choice 
cloge scores while lat^uage usage and language IQ scores should fall in the 
medial ranga# 

Alsoj toplicit in Table 9*12 is the assumption that the multiple-choice 
olo^e test will bdiave consistently across grade levels as long as passages 
are properly matched in readability with the psycho linguistic cOTipetence 
of the students* Such consistent behavior Is crucial to the possibility 
of using the sme it&a type to measure literal OOTiprehension, regardless 
of the content of the test passages or the ^reading ability of the student, 
Hence^ the predictions in Table 9*12 are not made by test level. 

Actual correlational levels. The actual correlational levels between 
multiple-choice cloze scores and criterion measures previously discussed 
is given in Table 9*13* Wien the predicted correlational level matches the 
actual correlational levels the actual correlation is underlined* As can 
be seen in Table 9*13, 19 out of 30 predictions were conf imed* Several 
others, notably language usage scores. Levels III and IV, were close to 
predicted levels. More Import antly^ beyond the consistent pattern of con- 
flmatlon, the crucial correlational levels— nfflaely the wh-ltem test, the 
vocabula:^, the GAT comprehension, and the test-wiseness scores— were 
all conf imed* It will be notedp however, that the correlational levels 
fall off consistently in Level IV, Preceding sections of this chapter have 
already analysed this phenOTienon, TOiat Is important here is to note that 
the correlational pattern in Level IV rmains consistent with preceding 
GAT levels In ^Ite of the reduced values* No attrapt will be made to 
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eicpiain the failed predictions in the case of language usage, language 
TOachanics, language and non-language IQ until the actual test items are 
exmlned and ordered in relation to the constraotp literal coraprahension. 
Further analyiis of eorrilational levels between CAT skill score s and 
multiple-choice cloze scores is taken up in finer detail In the.neKt seotion 
by brewing the CM! skill scores into their cM^onent subscores. 

Table 9.13 

Actual Zero-Order Correlations of Multiple-Choice Clo^e Scores with 
TOi-ltem Test Scores^ California Achtev^flent Teat Skill Scores, 
Language and Non-Language IQ Scores^ and Test-¥lseness Scores 







Test 


Level 






I 


11 


III 


tf 


Wh-ItCT Test 


.68 


.74^ 


.73 


.56 


Vocabulaiy 


.67 


.75 


.69 


.55 


CompEehenslon 


.62 


.78 


.72 


.55 


Language Usage 


.51 ^ 


.73 


.57 


.55 


Language Mechanics 


.56 


.71 


.68 


•fc 


Language IQ 


.35 


.54 


.62 




Non-Language IQ 


.45 




,52 




Test-Wlseness 


.23 


.26 


.29 


.23 



Note a Underlined values are within the predicted correlational levels. 



^evel IV scores represent only grade 8 studenti In the GAT scores^ 
instead of the 7th and 8th grades intendeds 

^The Level IV Sfflnple did not receive the IQ test. 

^he r of vs* clo^e in the combined grades 1 to 3 is #81. 

CAT subscoresi predjctadd correlational levels ^ Table 9.14 present e 
^e predicted correlational levels between multiple-choice cloze scores and 
^bSGores on the CM. ^aln, the predictions are e^^ected to hold regardless 
of grade levels or variations in content of test passages. Predictions 
are also based on the relationships between the construct^ literal com- 
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prehaniion, and the GAT subSGores^ Insofar as that can be detamined from 
the meagei daia^iptions of the subskills in the GAT Test Goegdlnator' s 
Handbook (1970). For purposes of this analysiip it is again asswnsd that 
tha CAT labels are applied oonsiatently to test It^OTi. 

Table 9»i4 

Predioted Levels of Gorrelations between Multlpla-Ghoice Glo^e 
Test Scores and California Achievanent Test Subscores 



Medial (.30--. 54 ) Highest (,55+ ) 



VQcabulary Sentence-Picture Assoc, 

subscores Beginning Sounds 

Ending Sounds 
Letter Recognition 
Word Form 

Compr ehens ion Inferences 



Language 
subscorc J 



Sentence Structures 
Sentence Parts 
and Functions 
Trans f orma t i one 



Picture-Word Assoc* Words in 
Word Recognition GonteKt 



Relationships Facts 
Generalisations Interpret 
Reading-General tatlons 
Read ing-Soc* Studies 
Reading-Science 
Reading -tfaths 

Standard English 



Table 9.14 indicates that the most important correlations are between 
the multipla»choice close scores and the CAT ^ords in Gontesct, Facts, and 
Interpretation subscores. They are e^^ected to correlate hl^est with 
the multiple-choice cloze since they sccti to access "the granffiaatical and 
sanantic relations which obtain within and mong the sentences of the dis- 
course." Strong predictions are also made that the CM: Inferanees s^ibscores 
will correlate least with multiple-choice cloze scores since inferential 
and related reasoning processes require rea^ning beyond the literal meaning 
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of Che test passages. Such itans are also liable to be least passage de- 
pendent, contrary to the danands of the construot, literal comprehension 
Another strong prediction is that those CAT Itras that measure phonological 
skills (Senteme-Picture Association,. Beglntiliag Sounds, and Ending Sounds) 
will correlate lowest with multiple-choice cloaa scores since the model 
of reading as a constructive language process (described in Chapters II 
throu^ IV) behind the construct, literal comprehension, posits that 
phonological processes are ordinarily bypassed in processing OTitten dis- 
course. The rraaining subskills are parceled out according to their 
apparent relationriiip with the construct. The reading scores in the .subject 
areas - on the CAT are e^^ected to correlate in the medial range with multiple 
choice cloze scores slme the subject area scores sean to subswne the full 
range of comprehension subskills from facts to inferences* 

Actual correlational levels. Table 9.15 gives the actual correlational 
levels between multiple-choice cloze scores and CAT subscores. As Table 
9.15 indicates^ 27 of 47 correlations fall within the predicted levels* 
An additional five subscores— including the important Words in ConteKt, 
Level IV^ and Interpretationp Level I^«are very close to predicted levels* 
The consistent pattern, then, is to confirm predictions about correlational 
levels with CAT IJords in Gontextj Facts, and Inte^retatio^ ^b scores and 
Test»Wlseness scores* As noted previously, eorrelatlonal values fall off 
In Level IV as a result of a reduction In the range of student ability 
represented at that level| consequently, there are dlsconf imatlons of 
predicted correlational levels with Words in Gontext, Facts, and Intei^re*- 
tatlon at Level IV. The general pattern, however, is still evident in 
Level IV. That Is, with the eKception of an unesqplained aberration with 
Sentence Parts and IMnctions^ the highest correlations at Level IV are 
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Table Sas 



AGtual Zaro-Order Correlations between Multiple Gholee Cloze Test 
Score and California Achlevment Test Subi.coree by CAT- Level 



CAT TM^A^ 



11 HI IV* 



VQcabalEry Subscores 



Sentence-PiGture Association' 

Beginning Sounds -45 

Eliding Sounds *34 

Letter Recognition * 11 

Word Form * 28 

Picture-Word Association *55 

Word Recognition * 39 ,56 

Words In Gontext ^67 .75 ,70 *51 



Gomprehepsion Subs cores 
Facts 

Interpretation 



.61 .76 .67 A2 

.54 ^ J% *44 

Relationships *43 ^ ^JJ^ 

Generalizations -73 *51^ ^41 - 

Inferences *50 .73 ,35 -41 

Reading-General '62 

Reading-Social Studies *58 ^^42_ 

Reading-Science *58 

Reading-Mathematics '51 

Language Subscores 

Standard English j45. *73 



Sentence Structure 



37^ ,41 



Sentence Parts and Functions -31^ .50 

Transformation 



Note, Underlined values were within level of correlation predictad. 



^Grade 8 students only. 
^^Five or fewer items on the GAT, 



335 



9-44 



Wordi In GonteKfe and Inteirpvetatio^ The results, then, tend to substantiate 
aisumptlons about the relation^ip^ between the constmict, literal cOTapre- 
hension, its principal indicator, the multiple^-Ghoiee clo2e, and 
Bah iaoreSs 

iTOonsistent eorrelatlonal levels. Despite the general tendency 
to corroborate the assmnptlons behind the predictions, there are a con- 
siderable nwaber of disconfimationi evident in Table 9#15 that require 
further analysis^ even in this preltoina^ inveatigatloiw The mo it notable 
inconsistency is the une3qpectedly high level of correlation between multiple-* 
choice cloge scores and In£e«fices subscores, especially at Levels I 
and II# In the following dlicuiSion o£ these inconsistencies, the 
assimptiDns behind the predictions are eKmlned in more detail, and then 
the CAT inference itans thwielves are reviewed in relation to tiie construct, 
literal coraprehension* 

kny intarpretatlon of bitten discourse involves 'inferential" processes. 
As noted in Chapter 11 meaning is not to the text| rather, meaning is in 
the reader and the writer, and what appears on the printed page Is only an 
approxtoatlon of the meaning Intended by the writer or apprehended by the 
reader. The text contains only orthographic clues to meanings The reader 
must "infer" gramnatical and iOTantlc relationi in and among the sentences 
o£ the discourse from the llngulitlc clues to such relations in the text. 
But these 'Hnf erential" processai are lanpiage^speciflci that is, they are 
part of the grmmiar of the language and are, therefore, well within the 
processes of the constanicC, literal cosi^rehension-^the apprehension of 
"the granmiatlcal and sOTantic relations %rtilch obtain within and mong the 
sent ernes of the discourse." 

On the other hand, "Inftrencei, " as coramonly understood in 
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educational psycholoa^, refer to deduGtlve and related formal reasoning 

procasseB that are mt part of the grmmar of a language* Such Inferential 

processes are qulte^beyond the psychollnguistic processes of literal 

compreheniionp and require not only the apprehenilon of the literal meaning 

of a te^ct Mt^rf^w .tts^ betTOen literal 

meaning and other infomation not In evidence in the text. Ai noted in 

Chapter I, suoh inferential proceises tend to subordinate the Infomation 

in the text to esctra-textual inf QBrtiatlon^ thus reducing the loading of 

literal oomprebeniion in thm test and the paisage dependency of the test 

itrais. It was ^Ith these kinds of inferential processes in mind that the 

predictions regarding CDrrelatlonal levels between multiple-choice clo^e 

scores and GAT Inferences sub scores were made, the predicted correlational 

levels were lo^s but, as noted above, the actual levels were medial and 

hi^. The GAT ItOTS were eKOTined in an attOTpt to e^lain the Inconsistency. 

An escattlnation of the GAT test items at Levels 1 and II implies a 

vague, global notion of Inference. That is, test itanas that vary greatly 

in the kinds of demands they make upon the reasoning processes of the 

student are all iubsmned under the labels "InferenMS^," Itan nmbers 3, 

16, 20, 21, and 24 at Level 1^ for instance, are all labeled "IirfersTOes" 

but make ve^ lo^ level danands on student reasoning processes. ItM 3 

is characteristics 

A anall boy naned Hen^ lived in the city. He had a pet 
dog, a kitten, and two birds in his hOTe. Henry liked 
to play with the dog best. 

3. Ifow mBtty pets did Henry have? 

o one 
o two 
o three 
o four 

037 
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The sfcudent Is asked to demonstrate some ability to coordlnato arlttaetleal 
skills with literal comprehanalon reading skllli." Itan S, however, makes 
quite different donands upon the student's inferential abllltlesi 
5, "mien Henry takes care of his animals, he Is 

o bu^ 
o la^ 
o ior^ 
0 worried 

In the first place, the Itaii ston Introduces subordination, thus dananding 
more linguistic prowess than the sentences In the text. Secondly, the 
Infomation necessary to mdce a judgement anong the responses Is not clearly 
stated in the text. Thirdly, the sanantlc cranplexlty of the responses 
exceeds the level of vocabula^ In the text and in the other sets of response 
acconpanying the test passages* Fourthly, the Itan is obviously passage 
independent* Given "antaals" (plural) In the Itan stan, "bu^" would merely 
be descriptive of someone taking care of thmi. But any of the other 
dlstractors is plausible. There is no information in the text that makes 
••busy" any more correcc than the other responses. The correct response can 
be "Inferred" just as easily from the itffln stem as from the test passage. 

In summa^, then, there is a wide range of Inferential skills subsumed 
under the label, ",lnf erenoeS." in CAT Level 1. Items 3, 16, 20, 21, and 
24 make minimal danands on the reasoning powers of the student, emphasizing 
Instead the granraatical and Sfflnantlc relations within the discourse. Items 
5, 10, and 11, on the other hand, demand that the student reason beyond the 
gramnatical and sanantlc relations within the discourse. The majority of 
the Itans, therefore, fflnphaslze literal cOTiprahension in spite of the label, 
"Inferences" and hence oorrelate higher with the multiple-choice close than 
predicted. 

338 

9-47 



Tha .ame patter Is repeated at CAT Laval 11. The.a a.s sight tt«ns 

diseussed, and .eq,i.« oxd, an «nda.standl«8 of the conoapt, than 
one," and an ability to relate the concept to the passage. 

The chlldEen In Mrs. Kto»s worn were talking about 
how tf L«pbooks. iva sAld, "1 ^^^^^ ^ 
nlctSel." "I will bring some scissors, ''^Monty said. 
Marie said, "and 1 will bring some paper. 

The children decided they would need tnore paste than 
they had. To make paste they would need water, „ 
anf salt. Eva said. "I will bring a pan to mix ch«n in. 

16. mio will bring more than one thing? 



o Eva 
o Marie 
o Monty 
o Mrs. Kim 



t5. cn the other h.nd. like item 5 In 0*1 Level 1. teiutre. e 

characterization .£ the action of the paragraph beyond its gra-atical and 

sOTantlo relatlotisi 

15, The childraa ware 



o busy 

o playing 
0 tired 



Item 30 Is passage Independents 

30. The WB turoed by 



o a motor 

o the wate^ 
o the wind 



If the student has the semantic knowledge, he can obviously answer such a 
question without reading the te.t. Even without such Icnowledge. a student 
can "ioEer" the answer fron. the item stenu Item 40 is also passage 

independants 
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40a A polar bear»s hairy feet are especially useful on 

o ice and snow 
o OTcl^ ground 
o sandy beaches 
o sha^ itones 

These passage Independant itras^ hQWever^ access such a limited 
semantic knowledge that a student who tmderstands the vocabulary o£ the 
test itOTi will have no difficulty choosing the correct answer. In other 
wordSj such test Icots^ even though they are labeled ^»Xn£erencei" sean 
to access even lower level p^dtiolingulstic processes than literal ooi^re-* 
henslon* Items 30 and 40 are little more than simple vocabula:rs? testa, | 
"wintolil" is associated with "wind," and "polar bears'" are associated with 
"ice and snow*" The majority of the •'Xnf erences" items on GAT Level 11, 
then, are well within the psycho linguistic processes of literal compre- 
hension and therefore correlate more highly with the multiple-choice cloae 
than esqpected* 

In swmnary^ there is again a range of psycho linguistic procesies 
subsiflned under the label "Inferences" in Level 11, but an exanination 
of the ItOTS reveals a prepo^^derance of processes that fall within the 
construct^ literal comprehension^ hence the hi^ correlation with the two 
experimental tests. (An eKmination of other disconf imations in predicted 
correlational levels revealed a similar miileading application of labeli 
to test itena on the C^,) GAT Levels 1^ lip lllp and IV, therefore^ 
appear in general to access more literal cOD^rehension procesies than the 
analysis of standardi^ied, norm-referenced tests In Ghapter I suggested* 
Inconsistencies in the ejected pattern of intercorrelatlons, therij on closer 
exOTilnatlon reveal the consistency of the construct, literal con^rehenslon, 
and the consistency of the behavior of the two e^erlmental tests In Spite 
of the misleading and Inconsistent labels on the GAT subscores* 
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Furthsr scudles . The foregoing analysis is lenerally supportive off the 

accuracy and validity of the construct, llteial camprehenslon, and Its two 
expertaental operationalizatlons, Che multlple-cholce cloze and the wh-ltem. 
But the foregoing analysis has also revealed inooTislsf-emles in the applica^ 
tlon oi labels to Items In the GA.T. Such InconslstQncieB are a nacessa^ 
consequence o£ a test that Is not theory-based. More meanlngeul analyses 
of the correlations among the literal comprahenslon tests and the CAT 
subscores, therefore, depend on an Item-by-. Item analysis of the CAT, 
defining each item In relation to the explicit construct, literal compre- 
hetislong rather than attempting to Interpret vague, ilobal labels like 
••Facts" or "Inferences." Such an ltem-hy-ltera analysis will also be a test 
of the expllcltness and consequently the utility o£ the construct Itself, 
that is, its ability to dlscrdmlnate between Items that appear to access 
dif ferehtt^ psycholinguist lo processes arid ttams that al so behave differeiitly 
in relation to the two operational laatlons of the construct literal compre- 
hension- Further studies, In other words, should lead to a refinement of 
both the construct and its operationalizatlons as well as the ability to 
identify what the tests actually measure, 
Validlt-y Across Alternate Measures of the Sane Const ruet 

The convergence of the two principal Tneasures of literal comprehenslOTi 
as Indicators of the smie construct Is evaluated In Table 9,1 6, which shows 
the parallel correlations of the multiple-choice cloze and ^rti-lteni tests 
mth the various CAT scores by GAT level In the study sample. As before, 
conflOTied predictions are underlined. Since the Intent of the analysis 
Is to exanine the dlfiorences In the absolute values of the correlations 
of both measures of literal ooraprehenslon with relevant criteria, the 
corielatlons of these two measures with the tests of iq have been Included 
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laUe 9.16 



Actual Zaro-Order CorreUtioiis Atyng Literal Comprehension Tcit 
Scores and CaltEarnla Achievement Test Subscores by CATT Level 
(Language and Non-Latigm ga IQ Scores 
Included as Adaitlonal CrlteEla) 



Vocabularx Subscores 

Sentcnce-Bic. Assoc, 
Begttinlng Sounds 
Ending Sounds 
Lettec Eecogiiitlon. 
Word Woim 

Picture-lflord Assoc, 
Mord. Recognition 
Words in Cante«t 

Comprehenslpn SubacoTes 

Factr ^ ^ 
Intarpretation 
Relatlonshlpa 
General Izs tion s 
Infer enccs 
Eeadiiig- Gcnora 1 
Headltig-Soc, Studies 
Readimg- Sclerice 
Readlmg-Math. 

Lan^uaftc Subscores 

StaTidard EnglLsh 
Sentence Str-ucture 
Sent. Parts and Funct 
TratisEorraa tlon 

Jjanguaga XQ . 
"Uon-Lannjase TQ 



CAT 2 



MCC^ WH^ 



,07 
,45 
,34 
,11 



.28 



.55 
,39 
.67 



.06 
.46 
-33 
jl2 
,3 4 
.60 
,44 
.71 



.61 .65 



.34 .62 



, SO . 34 



.45 



.35 
.45 



.41 
.48 



CA1 11 



MCC WH 



,56 .62 
,75 .78 



^76 .75 

.J3 .70 

,73 .67 

,73 .70 



.73 .69 



,54 .53 
,4a .46 



*MultlplQ- choice cloze test. 



QM III 



MCC WH 



.70 .64 



i67 
.70 
"43 
.51 
.35 
.62 
.58 
.56 
.51 



.3? 
.31 
.28 



.64 
.44 
.47 
.32 
.57 
.59 
.36 
.4£* 



.46 .41 



.32; 
,23 
,23 



,62 .55 
.52 .50= 



CAT 17 



.51 



*42 
,44 
.33 
.43 
.41 
.38 
,42 
742 
.39 



.32 
.41 
.50 
.24 



Wh-ttem test, 

Underlined values were withlit Level of coreelation pcedlcted. 



^Orade 8 studemts only. 



342 



9-51 



at the bottom of tlia table* 

Exaalnatlon of the cormlatlons in Table 9»16 rtveals a pat tarn o£ 
ranaikable cotislstency in the my the t^o very dlffaranc IndlcatQirs □£ the 
construet^ literal aoraprehensioa, behav^€ in relatlori to the range of 
psychollngulstlc skills accessed by the GAT subscoTOS. The differences 
between laultiplE^choiGe clo2e aad wh-ltem corralatlons are ,05 or less for 
36 oiit of 47 CAT subicores* The dlfferances betweeii 46 otit of 47 aJce «08 
or lass* The diffetetice in the one raaalnlng sub scores Sentence Parts and 
PumtioTis, Level IVp Is .15. ^he pattern of coiif iOTatloit Is thas also very 
coTisistent between the tw tests of literal compxehenslea, thaie being 
only 2 out of 48 tnstgnces vftiere there is lack of agrement on conflmatloTi 
or disoonf imatlon* A stollar patterTi of consistency or convexgenGe holds 
In the CDrrelattons of the wh-lterii and multiple-cliotGe cLoae tests -with 
the ineasuxes of IQ« 

The TiegLlglble differences In the way the tw tests of literal conipra- 
heiislon cOTpare in cotraLations across CM? levels and stibscores is even more 
rCTiarkahte considering the differences in format and content between the 
two es^erteantal tests* Besides radical differences la It™ t^^pe and fomat, 
the passages vary Ixi content and lemgth between the tw eKpertoental testa* 
Haltlple-cholce clo^a passages are never more than 70 words longi while 
#i-ltCT test passages aontalu as many as 220 TOrds* Woreoveri the contarit 
of the passages on the tw tests Is eoi^letaly dlffereiit* IIq passage that 
appears on the wh-*item test ^pears any^rtiere on the muLtlple-chQlce clo^e 
teit* But both tests are designed to measure literal aomprehension as 
defined in the construct regardless of variatlotis in the subject niatter, 
style, or Length of the reading passages. 

These data, t^en tag.©tier with the results of ptavlous analyses 
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reparted here of the correlatlotis maong siibscores and tctal scores on the 
multlpi€-*choic€ cloze atid vh-±taa test a, provide rather stconi conflmation 
of the yalMlty and geaerallgabiLlty of the tMit In questlorb 

Goticlu sloras 

This section of the report has preseoted some research data that 
reflect on the reLiablLity and validity of tw alternative appT^aches to 
the mea-siiremeat g£ literal comprehensloru One of thasa measures is a sub« 
stantlal modification of the cLoae procadure Into a fomat referred to as 
the muLtiple -choice clozes Tha otiier is a systraatlc aethod for writing 
inultlpLei-GEioice Gomprehension questions^ based oti the wh-tMnsfoOTatlon* 
The tests assOTibled for this raseaTCh In the ^-Item format wre Intended 
as a crlteiicn for studying the validity of the multipLe-cholae eloge, 
becausa It ^as Judged that no adequate Grltorion for the construct nnder- 
lying the close test existed* At the outset of this raseaTChy confidence 
in the wh^ltera fomat as an adequate operational transLatlcn of the 
constract c£ literal comprehension- was hedged, prlinariLy because It appeared 
that this Item format might tend to measure other traits unrelated to 
comprehension (e^g.^ test-wisenessj syntactic competence oiily, etc, )• 

One generallzatton that appears Justified from this prelMlnary research 
Is that the muLtlpie-choice clo2€ ancl ^A^-lton test fomats are equally 
valid measures of literal comprelieiision as daflned. This Is an unexpected 
and ralatiYaly po-warful conclnsioii that reflects strongly and positively 
on the reasoning guidlxig the test develoj^ent activity in this research. 
It TOuld appear that Carroll* s (1972) original suggestion that reading 
coinprehension can be separated Into at least tw basic £ actors-*© ne that has 
to do ^Ith the literal Inte^retafcion of the teKt and one chat has to do 
with reasoning or thinking beyot^ the literal ineanlng of the text— receives 
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some support Eraa the data presented here. But Gomparison with other 
tests that sttess infeTentlal processas more than the CK2 will be needed 
to pursue this possibtlity furthsr. ^ 

The results o£ this research aim indtcate that the hypothetical 
adv^antages of the multiple-choice olo^e fccnat disctisged in Chaptec V (e.g*^ 
Its object ivity, eGoiiOTny, flexibility, unldtoensioiialltys and domatn- 
refarBnclni) ca^n be realized. Tests assOTibled from the organised pool of 
nmltlple-cholce alam passages, refarcad to as the Test Davelopment t^otebo ok 
Qt TDN, proved to be highly reliahla under the clrcmstances predicated for 
the tests and It appeared that ai^ elo^e test of equal laiigth assembled 
from the TDN wuld^prove to be equally reliable. The prellminai^ attanpts 
to scale the tniiltiple-choice close passages based on the Rasch measurfflnent 
model alio Indicated that the very desirable sealing features of this model 
could be broadly applied to the total pcol of cloie passages* 

The results of the preliniliiary analyses of the data available on the 
ne^ measures of literal comprehension are thus strongly supportive of con- 
titiiiatlon of this research as planned tn bio ad ©utllne In Chapter TO, 
Xt ts further apparent from these research restilts that the methodology 
projected In this chapter for continued study of the clo^e format offers 
considerable potential for further clarifying the testimg of reading conpre- 
hension frm a p ^chollngiiistlc pDlnt of view. Future stages of this 
xesearch^ as outlined tn Chapter Vl, vlll eKpand the research methodQlogy 
to includei (a) the tneasurenient of variation In semantic and syntactic 
factors sampled in the range of clo^e passages in the TDN} and (b) the 
ineasurOTent of additional comprehension factors <iaatn iMea • and title a: . . 
questions) modeled after the approach tdcen with the current get of tA-icerns 
In. contrast with conventional measures of reading comprehension, the 
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multiplep-choice cloze md wh-ltan provide for the study of specific types 
of comprehension test Itans In the context of a caxefully controlled and 
specifiable scale of passage difficulty. 
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RULES FOR APBLlGa:iON OF THE CLOZE PIRDCEDURE 



%^ passage Select ion Gffittrla 
Length 

a. Grade Is 20-40 vfocds 
b# Grada 2l 35-50 weds 
c. Grades 3-1 2i 49-^80 -words 
2» sidelines gl'^^ the tainlinm number of w^ds mcassary to 
produGa a clo^abla paisage* HowOTe^,, a passage may extand 
beyond the guldallnas If this is meftssary to maet tha 
crlt63?ion o£ eohereiice* 

Bt Quallty"Coharance 

1* PaeMges must be ooheraiit— one s&ntence follorang another in 

oonnected discourse* 
2* ^an tha discoutse Is latarspers&d with mm^ escanpleSi 
problems illust^attonip etc.^ iuah aicraples mBy be 
- (jgjgtedv^^ togethar 
to form uniflsd paisages* 
3. The followlni may be deleted to m^at the criterioTi of 
aoh^^toca: 

a# transitional phrases 

b# refarences to ahartS| illutrationis diagrraSj etc. 
c# examples and pcoblertis. 

31, ritles 

A, Titles muit be descriptive of or clearly relatad to the content 
of the paasaga, 

B, Titles may be assigriad in any one of three ways' * 

U Use the title of the original souroe of the passage. 
2. Take a series of words \rirbatim frotn the passage, 
3^ Derive a title cons let lag of words taken from the passage 
but not taken verbatim. 

Ill, Readability - calculate pasiage raadability score uilng the procedure 
described in the Readability Manual CForm 80)* 

IV, Cio^ing the Passage 

A, Rules for Deletions 

1. Grades 1 and 2--clDie only nouns and verbs 

2. Grades 3»l2--cloge only nouns, verbs ^ adjectiveSj and adverbi 

3 . Do not Qlogei 

a. Function words (conjunctions^ prepositions j Inter j actions ^ 

auxiliary verbs) 

b. pronouns 
proper nouns 

d. Adjectives used 1ft proper names; (e.g.. Little Red Hen) 

e. Hyphenated words 
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Arabic or Roman numerala, e.g., 123, 3KV 
Abbreviations 

Phontmii . ( ^he sinalleat dlatinGtlve unit of speech e.g., 
aw, oo, ah,) 
Foreign words* 

Any form o£ the verb ^ be Ce*g*, la, are, werep etc) 
Idioms (l.e^, any words for which no dig tractors may 
be found which are both grarranatically plauiible and 
semantically Implausible* VI. , D, l*b, and c), 

Exanples g 

"1 know more about that than anyone else*" 

"It means knowing how they yawi or stretch. • #" 

"Salty was another mCTiber of the crew of the 
Sea Watch* . .. . ^ 

"How do you know that she'll want to be soldf 



B. The First Deletion 

1* The first deletion must be made at word 6, 7, 8, 9j or 10 
of the paasage,** 

2. Use a table of random nuinberi or permutation table to 
detennine first deletion, 

a* To assure the maKimum degree of randomngss, one must 

proceed in a consistent fashion through the randoni nutnbers 
table. That is, one must keep track of each number 
selected so that one can reiume using the random numoeis 
table at the proper place, 

b* If it la apparent, as will often be the case, that only 
one word is clozable, dispense with the random numbers 
table. 

3. Where the number taken from the random numbers table corresponds 
to a word in the passage which Is not clozable (See IV. A) 

a clozable word from worA 6 through 10 must be chosen. 



*Use Webster's Seventh New Collegiate Dictionary. 
^^See ipecial rules found in Readability Manual (Form 80) . 
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4. If no word frotn wordg 6 through 10 is clozable, the passage 
must be rejected. 
C. Subsequent Deletions i 

1. Grades 1 and 2- from the first clozed word, count forward eight 
words. If this aighth word can ba elezed, circle It, If not, 
continua counting forward to a word which can be cloEed. f^ircle 
It. Continue this process until 3 delations have been made for 

grade 1 and 5 for grade 2, j , . 4. 

a. Wherever possible, leave seven ^ords bacwean deletions at 

grades 1 and 2, _ 

b. If the eighth word cannot ba clozed, tt is also pertnissible 
to go. back one or tTO| In no case at grades 1 and 2 may 
there be fewer than five words betweeti deletionf . 

c It IS permissible to leave as many as 11 words between daletlons 
but in no mora than two instances per passage should there be 
more than 7 words between deletions. 
2 Grades 3-12. from the first closed word, count forward fiv« words. 
If this fifth word can be clozed, circle it. If "Ot continue 
counting forward to a word which can be clozed. Circle it. Con- 
tinue this process until IC deletions have been made. _ , 

a. Wherever possible, leave ^ words between deletions, 

b If the fifth word cannot be clozed, it is permlsslbla 
to go back one, thus leavini three words between 
deletions, but in no more than 2 Instances per passage 
can there be fewer than 4 words be iTJe en dale t ions , 
" ^ c. It ls"perrarsslFia"^to leave as many as 11 words between deletions, 
but in no more than two Instaiices per passage ahould there be 
more than 7 words between deletions. „ 

r^'Sre M'»"words found In co^non usage. Core lists "-/l-i^^^^by . 
■ 1 Grade level-thase lists are cumulative. That is, all words found 
* on grade 1 lists are also on grade 2 lists; all words found on 
grade 2 lists are on grade 3 lists, etc. 
2 Part of speech- -nouns, verbs, adjectives, adverbs. Words 
that function as different parti of speech may appear on 
mora than one list (e.g., humor appaara on both the noun and 

B ContInt\lsts--words which are associatad with a particular area 
of the curriculum. These areas Include: languaga arts, social 
studies, science, math. Like the core lists, the content lists 
are divided by grade level and part of speech. 



f 'loill/'thesrsSs m determining whether to use the core or content 

""mstractors for clozed words ara tn be taken from core lists eKcept 
SJen closed words appaar to be characteristic of particular currlcu- 

2 WhenTclozed word seams to belong to the special vocabulary of a 
* particular curriculum area, ^"lerican Heritage Word Frequency 
Book must be consulced for corroboration. 

r^Any word appearing In a single curriculum area with a freguency 
of 7 or higher is consldared a content word in that curriculum 



area, 



363 



A-4 



b. Any word appearing in t^o curriculum areas with a ftmqmmy of 
7 or higher is considered a contant word in both currlculutn 

ar 

c. A word appearing in three or more curriculum areas, but oc- 
curring twice as often in one area as in any other, ii con- 
iidered a content word In that ataa where it appears 

most frequently. 

3 If a cloi;ed word is considered a content word In one cur^ 
riculum area, choose dlstractors frotn the correspotidlng content 
area "word llsti 

4 If a cloEed word Is considered a content word in more than 
one eurricuXum area, choose diitractors froin the content area 
word list which corresponds to the conten? of the passage xtself . 

B Dlstractors must be taken f rom ^ord lists at the same grade Uvel as 
the passage source (e.g., if a passage was taken from a grad© 3 text, 
dlstractors must be taken from grade 3 word lists), 

C. Use of Part of Speech Lists 

1. Detemlna the conteKtual function (i.e., part of speech) 
of the deleted word, and take dlstra^ . 
responding part of speech list. 

2, If the deleted word is a verbalj chooie dlstractors fratri 
the verb list, - ■ 

D/ Rules for Assigning Dlstractors 

1, General ^ j i ^ a 

a. Dlstractors may not be synonytnoua with deleted worde 
(e.g*, avoid : quick, fast, swifti drunk, inebriated, 
tntoxicatad; lethargy^ lassitude, enervation). 

b, Dlstractors may not be seniantlcally plausible within the 
context of the entire passage Intact. 

^amples t 

»^H^^ervone was bargaining marrllv. loudly back and 

t:orth 

*»Sl5c more weekends^ years would pass.,.." 

''Some days the hunting was good: little, enough 
animals were killed to feed all of the people/' 

"She was tall, wondrous , long-haired and dreamily 
gentle, ..." 

'^Quickly he pulled his canoe up to a snu^, safe 
place on the shore*.*.*' 

"..,the boy tightly held the book that had caused 
him to stay^ hurry out so late." 

•^ ,,he as a wolfer and I as a Mountle desired,. covered 
pretty much the same territory." 

N B An inability to select setnantically implausible distractors may result 
from insufficient conteKt within the passage. The passage must then 
be reclozed or discarded. 

36 



A-5 



c. Distractors may'not be grfflmftatiGally implaiisible within the 
conteKt of the entire sentence intacti 



'^rs« Carver ^avey asked, hit hifl old speller." 
not 

'Wrs* Carver ga*^ , pretended, bectiflt him an old 
speller." 

'^he people of Boston had had it wi^h British 
rule , money, arrogame." 

not 

^^he people of Boston had had it with Brltidi 
rule , palaoe, prlnee*" 

d. At least one distractor should reaembli the deleted word 
^ ' in lengttt^ (e;g/r3ir^^ientolofgy 

tricityi but not -* -home, lamp^ ball, rook, gvnecdoche) . 

2* Nouns 

a. Distractori mu^t agree in number with the deleted noun 
(e.g,j if the deleted noun is a plural, such as cows, 
then distractors should all be pluraligad--bats , 
buildings ^ trains , t each ere ^> 

b. Distractors must agree with the articles a and an whan 
they precede deleted nouns (e.g., a VDlley ball, m 
organgutan) * 

n. When deleted nouns are not preceded by an article, 
distractors must also be able to function without a 
preceding article (e.g., Money tnaka^ the world go around| 
anthropology im very interesting. But not--Car makes 
the world go around| tree la very Ititeresting. ) 
' 3 Verbs--D±stractors must agree in persom, number, and tense with 

the deleted verb (e.g., playa* i^iMi played, swam| 

played j swum). ^ ^ ^ 4. ^ 

4. Adjectives--Dis tractors must agree in degree with the deleted 
adjectives (e,g,, funny, happyf funnieic, happier| funniast. 

Em Etober of dletraotors by grade level 
1# Grade 1—2 distractors* 
2» Grades 2 and 3«-3 distractors* 
3. Grades 4 and up«^4 distractors. 
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MAIN IDEA- MfD VJH» TtM CWONBNT OF THE TDN 
The tnain itdea and wh-item component Q& the TDM contains 15 passages 
for each of the first 20 readablUty UveU established from the Spache and 
DaU-Chall forTOolas. The lengths of the passages vary systematically by 
level. The apaclfiied passage length8 and the readability scores for all 20 
levels are shoW In Tabla 5.6. Additlonsl specifications concerned the unity 
of each passage, its utility as a mmm £m main idea and detail questions, 
and its sulcaWlity in content, sfcyli, and vocabulary for the pupils with 
whom it would ttonnally be used. The %fOMbulary off the passages was controlled 
to a great exftent by the word lists ol the readabiitty formulas. Vocabulary " 
was further wtitroUed by the use of Hawls and Jacobson's (1972) basal-reader 
or "core" wo^a lists for levels 1-12 C&radfts 1-6) and the American Heritage ' 
Word Frequfflgg . Book (1974) for lavels I3«a0* These references sarved as 
guides for determining the acceptability of Individual words In passages and 
in item respiartses. 

Passage fnaterlal was taken from exi&tlng criterion-referenced tests 
(the Duval County, Florida, tests for Individual ly Paced Instruction in 
Reading and CAM tests used in varioua dlsfcrlcts In the State) and from a 
variety of hooks and magazines. A substantial amount of new n.*t:erlal was 
writcen. Existing test passages were edited extensively to meet the passage 
specifications. Modifications in excerpt* from books and magazines were 
limited to « few individual word changea to meet the vocabulary requirements 
of the readability formulas. An effort was wade to have a balance of fic- 
tional and non-fictional passages and to have diversity of subject matter 
within thes© broad categories. 



Table 5.6 



Laval 



i^tngth and ReaclabiLity Score Speelfications 
Literal Ccmprahension Passages 



Words Readability Score 



1 


26 35 


1, 0 - 1*4 


2 


36 - 45 


1* 5 » 1*9 


3 


46 - 55 


2, 0 - 2,4 


4 


56 - 65 


2. 5 - 2,9 


5 


65 - 75 


3, 0 3,4 


6 


76 85 


3, 5 - 3,9 


7 


36 - 95 


4,50 - 4*74 


8 


96 ^ 105 


4,75 - 4,99 


9' 


106 - 115 


&.0D^ - &v24- 


10 


116 - 125 


5,25 - 5,49 


11 


126 - 135 


5,50 - 5,74 


12 


136 - 145 


5,75 - 5,99 


13 


146 - 155 


6,00 - 6,24 


14 


156 - 165 


6,25 - 6,49 


15 


166 - 175 


6,50 - 6,74 


16 


166 - 175 


6,75 - 6,99 


17 


166 - 220 ^ 


7,00 - 7,24 


18 


166 » 220 ^ 


7,25 - 7,49 


19 


166 - 220 ^ 


7,50 - 1.14 


20 


166 - 220 * 


7.75 7,99 



^ ht the four highlit levfele .the v/ord range was extended in order to have 
fictional - . ' >^sB w,th the required readability scorea . Non-fictional 



t'l t-o a maximuin of 185 words. 
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m syateaatizing the writing of test items, 12 different types of 
questions were identified; 4 for main idea and 8 for details. (Only detail 

itemi were used In the test admlnisttration under diacussion,) Rules for 
constructing these items are contained in "Item-Writing Format and Procedure, 
Main Idea and Wh- Items." Given 12 possible items, the maximum number of 
items that could have been written for the 300 passages was 3,600. Because 
all questions could not be aFked on every passage, the number produced was 
closer to 3,000. 
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XTBK WHITING FOmAT AM) PROCEDUMp MAIN IDEA AND WH^ ITEMS 



r-hiia Idea Questions (four possible questions) 

^\ I* Title^ ^estions (too possible questions) 

Format, Levels 1-6: The best title for this story is a, b, c. 

Levels 7-20 1 The best title for this selection is a, b, Cf d* 
I, lil. Given a passages 2 
I, 1*2, Write, if possible^ a question witn verbatto responses. 
I* 1.3. Write, if possible^ a question with derived^ rasponses* 
I* 1.4, Write only plausible distractors| write parallel distractors 
when possiblei write distractors that closaly inatch the^ 
correct response in number of wordsf write distractors that 
are appropriate to the level of the passage. 
I. i.5. If distroctors are not aqual ±n lengthy writ: at least one 
distractor which closely matches the correct response in 
' ength. 

I* 1.6. Avoid negative items except when requited by passage. 

I, 2. Main Idea"^ Questions (2 possible questions) 

Format, Levels l-6i What is this story mostly about? a, b, c# 

Luvels 7-20* The main idea of this selection is a^ ^5 d, 
1# 2.1. Given a passage! ^ 
1. 2*2. Write, if possible, a question with verbatim responses* 
I. 2*3. Writes if possibles a question with derived^ responses. 
I. 2*4* Write only plausible distractorsi write parallel distractors 
when possiblei write distractors that closely match the 
correct response in number of words when possiblei write 
distractors that are appropriate to the level of the passage 
I* 2*5. If distractors are not equal in length, write at least one 
distractor which closely matches the correct response in 
length, or write all responses of unequal length* 

I. 2t6. Avoid negative items except when required by passage* 

II. Detail Questions 

formati Levels 1-4| 3 responses 

Levels 5-20, 4 responses 
IX* 1* Given a passage: 

II, 2, Randomly t&^ke a sentence number from a permutation block repre- 

senting all Dossible sentences In the passage (in this case, 1-16) 
II* 2.1. Take numbers from left to right across the block^and so on 
down through the entire block if necessaryi if block is 
exhausted before the passage, use next blockl always start 
a passage with a new block. 

II, 2.2* If number taken from block does not represent a sentence in 

the passage (e.g*, 15 when there are only 10 sentences) , 
take the next number. 
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II. 3. Starting at tha top. take a daUail question from the eoUowing 
alphabetical list (sec attar.hinent for types and examplaD ot 
detail) I 
HOT 

^JHAT - nour., pronoun 

WHAT - verb 

WHEN 

WHEM 

WHICH 

WHO Cm) 

WHY . 
II. 4. If possible, write the detail question about the sentence taken 

in 11* 2. 

II, 4.1^ Write clears concise questions in colloquial English, 

changing the wording of the sentertce- as little as possible. 
(Exceptioni replace pronouns with their referents,) 
. Begin e/nch question with the appropriate detail word 

(egg* 5 how 5 what, etc-)* ^ 
Avoid anaphora when possible* 
Avoid Inference* ° 

Ask each detail question only once per passage. 
If possible, ask all 8 detail questions of each passage* 
Ask only one detail question per sentence unless tixa 
sentence or passage is rich in detail and there are few 
sentences, in which case repeat II. 2. from a new permuta- 
tion blor' ntil all 8 wh-questlons have been asked if 
possible 

e' detail question cannot be asked of the sentence taken in 
. (e.g.j there is no answer'' to a ^^how" question)^ go on to 



II. 



II. 


4.1., 


II* 


4.2. 


XI. 


4.3. 




4.4. 


II. 


4*3. 


11. 


4.6* 


5* 


If t 




II. 




the 



sentence if possible. 
II* 5*1. If a detail question cannot be asked of a given sentence, 
return to that same detail question first on the next 
sentence taken (e.g., if ''how^' is skipped, return to "how" 
first on the next sentence). 
II. 6. Take the next sentence number In the permutation block and ask 
the next detail question until all the detail questions are 
exhausted if possible (Some passages may not be rich enough in 
detail to provide bases for all eight detail question typesO- 
II. 7* If possible, take the distractors from the passage verbatim* 
XI* 7.ll Write only grarranatically and semantically plausible dis- 
tractors* 

II* 7.2, Write parallel distractors when possible. 

11. 7*3. Write distractors that closely match the correct response 

in number of words. 
11. 7*4. If distractors are not parallel or equal in length, write 

at least one distractor that parallels or matches in lenath 

the correct response* 
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II. 7.5. Write no distractors that could be correct in the context 
of the passage. 

II. 7.6. Write distractors that are appropriate to the level o£ tne 
passage, 

II. 8. IE distractors cannot be taken verbatim from the passage^ 

II. 8.1. Take distractors from the pasaagej changing them as little 

as passible In order to make them paralLel and granniiatlcally 
and semanticaUy plausible (e.g., add determiners, adveros, 
subordlnators, fl>.tc. | or change verb tense, number, etc,| 
delete words; ioin words scattered places in the 

passage). 

II. 8.2. If parallel, plausible dlsti -ors cannot be found in the 
passage, or if such distractors make the correct response 
debatabie, take distractors from outside the passage. Such 
distractors must meet all the criteria in II. 7,1, to 
II, 7,6. above. 



Footnotes 



^Title refers to the "subject" or "topic" of the passage (a noun with 
or without modifiers). 

^Verbat:^m means that the words are reproduced exactly an ^.lu'y are in 
the passage. The only exceptions would be the replacemant of p-onouns by 
their referents or the addition of determiners. 

^Derived means one or more words are changed or added to the words in 
the passage or that word order is changed, 

Slain idea refers to a complete sentence incorporating tiie bssential 
point(s) of the se' -tion. 

verbatim- idea would be •» "topic sentence" or "thesis statement." 

derived iu,.in ±i . a would supply a topic sentence or thesis statement 
where there is none (or is a nriatlon on the tonic sentence or thesis state- 
ment in the passage). 

^The referent for a pronoun may be in preceding sentences. Adverbs 
like "soon" or "then" may refer to actions or situations in preceding sen- 
tences. 

^The only exceptions would be passages where the logical relationship 
between two or more sentences Is clearly implied. For examples "Carmen is 
writing to her friend, Carlos. Next Saturday will b^hp birthday." Why 
is carmen writing to Carlos? Because next Saturday f 

Bacause is not in the passage but is logically and clearly ^plied as an 
express ion of the relationship between the two sentences. "Tm, the turtle,, 
h^fa n^ shell. He is v«ry happy. " Why is Tto happy? he has a 

new shel 1 . 



371 



How 



What 



What 



When 



Adverbial 



Verb 
Adjectival 



Noun I 

Pronoun 



Verb 



Adverbial" 
resul t 

Adverbial- 
time 



Where Adverbial 



Which Adjectival 



EKample Q. 



Q, .How many* . .? 
Q, How tall was the 
tree? 

Q. How are ahoes made? 
How did the brook 
flow? 

Q. How does John get 

to school? 
Q, How did Mary look? 



Q, What dad Jim need? 
Q. What did- John eat? 
Q. What swam fast? 



Q. What did Tim do? 

Q, What does Jane do? 
Q, What w^:. Harry 
doing? 



Q. When did the pOp= 
corn pop? 
When did the boys 
come home? 



Where did Jack go? 



Q, Whose cat was it? 
Q, Which hat did Davy 
wear? 

Q, What kind of outfit 

did he wear? 
Q, What color was 

Bill' s shirt? 



Who Noun, person Who played ball? 
name (or pro- 
noun standing Q. Whom did the car 
for person) hit? 



Q, Why did Tom trip^ 



Why Adverbial- 
cause 5 expli- 
cit 

Implicit Q. Why did the ice 

melt? 



Examp l_e_A i„..^^. . . 

A. JO, 40, etc, 
A. vury tall 

A, with leather 
A. rapidly 

A* drives 



A, sad 3 happy t pretty, 
etc . 



help 

lunchs ices creanu it 
the fish 



A, 
A. 



ran^ ate, slept ^ 
fell, etc, 
singSp laughsg etc. 
thinking, talking, 
etc . 



A, when the steam in- 

side expanded 
A, in the evenings after 

schools fit 4 cluck, 

et c ^ 



for a walk, oucside, 
to town, to New York 



A, Tom^s^ Mary's, John's 
A. coonskin, blue, floppy^ 
big 

A new, old, dirty 
A. blue, red, white 



A. Herble, the boys, the 
players, he, they, etc, 

A. Herble, themi him, her, 
Mary, etc. 



A, because his shoes 
were too big 

A, The sun got very 
hot. 
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Table 8e 6 



Easiness of Passages or. Multiple-Choice Cloze Exercises by Form 



Foot Mean easiness 
grades l-*3 



Easiness: 



Passage 



Percent of Responses Correct 
Grade 



58* IT 01-01-01-01-01-035 

01- 02-01-^01-01-0^4 

02- 04-01-01-01-020 

03- 05-01-^01-01-007 

04- 07-01-01-02-012 
0^,=09-01-01-05-039 

56.50 01-01-01-01-01-003 

01- 02-01-'01-01-040 

02- 04-01-01=03-038 
04-06-01-01-01-003 
04-07-01-01-05-019' 
04-09-01-01-05-036 

60.83 01^01-01-01-01-004 

01- 02-01-01-01-041 

02- 04-01-01-01-023 

03- 05-01-01-01-009 

04- 08-U1-01-01-020 

05- 09-01-01-01-016 

6Q.67 01-01-01-01-01-034 

01- 02-01-01-01-037 

02- 04-01-01-01-030 

03- 05-01-01-01-008 

03- 07-01-01-03-029 

04- 09-01-01-01-029 

61.50 01-01-01-01-01-005 

01- 02-01-01-01-027 

02- 04-01-01-05-040 
03.06-01-01-02-020 

05- 07-01-01-04-007 

05- 09-01-01-02-014 

59*50 01-01-01-01-01-002 
01-02-01-01-01-039 
01-03-01-01-01-047 
04-06-01-01-01-004 
04-07-01-01-05-018 

06- 09-01-01-01-003 



1 


9 

£i 


3 


1-3 


58 


78 


87 


75 


59 


85 


88 


78 


J J 


65 


81 


61 


J 1 


fin 


78 


57 






63 


46 


1 Q 


29 


46 


32 


59 


80 


95 


78 


60 


79 


90 


77 




68 


79 


62 




J / 


63 


43 




J / 


S9 


41 


1 O 




w J 


38 


70 


B i. 


98 


83 


71 


79 


94 


81 






9 1 


69 






77 


57 


ly 


9 1 


J 7 


35 




J i 




40 


58 


86 


89 


78 


54 


76 


94 


75 




A7 


O J 


63 


■ ^ 




1% 


53 




J J 


7^ 


54 




"57 


Q J 


41 


DO 




87 


81 


62 


83 


87 


77 


AO 


67 


74 


61 


29 


59 


65 


51 


2A 


62 


72 


54 


25 


48 


62 


45 


58 


81 


91 


77 


(A 


82 


93 


79 


55 


79 


90 


74 


23 


48 


66 


45 


25 


54 


64 


47 


18 


40 


49 


35 
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Table 8,6 (Continued) 



Fom Mean easiness 
grades 1-3 



59«83 



8 



56.83 



60.17 



10 



58.50 



11 



61,00 



12 



60,00 



Easlnesss 


Pa 


rcent of 


Resp 


onses Correct 


P as saee 




Grade 






1 


2 


3 


1-3 


01-01-01-01-01- 008 


69 


82 


98 


83 


01-0/':-01. 01-01-026 


66 


79 


95 


80 


02-03-01-01-02-014 


38 


53 


87 


60 


04-06-01-01-01-002 


28 


48 


70 


49 


05.07-01-01-04-006 


26 


45 


77 


50 


05-10-01-01-01-024 


20 


29 


61 


37 


oi-oi-oi-oi-oi-oog 


55 


74 


80 


71 


01-02-01-01-01-023 


A6 


89 


86 


75 


01-03-01-01-01-033 


39 


74 


80 


66 


03-06-01-01-01-016 


24 


50 


59 


45 


04-08.01-01-02-026 


20 


48 


57 


43 


05-10-01-01-01-025 


16 


44 


59 


41 


01-01-01-01-01-007 


51 


85 


93 


78 


01-02-01-01-01-036 


A6 


7.-^ 


81 


67 


02-04-01.01-01-017 


39 


79 


92 


71 


03-06-01.01-02-003 


29 


62 


74 


56 


05-08-01.01-01-011 


21 


50 


72 


49 


04-09-01-01-05-037 


22 


39 


57 


40 


Ol-Ol-Ol-Ol-Ol-OOl 


70 


76 


87 


78 


01-02-01-01-01.042 


61 


71 


95 


76 


02-04-01-01-01-036 


43 


69 


86 


67 


03.06-01-01-02-004 


28 


47 


66 


48 


05-07-01-01-0.3-005 


29 


39 


71 


48 


04-09-01-01-01-030 


X5 


27 


58 


34 


01-01-01-01-01-006 


51 


72 


81 


69 


ni-02-oi-Oi-oi-oia 


A9 


82 


85 


73 


J2-03-ai-01-01-001 


41 


75 


86 


69 


03-06-01-01-02-021 


34 


58 


80 


59 


O5-07-O1-O1-01-O03 


27 


46 


68 


49 


04-09-01-01-02-035 


23 


46 


71 


48 


01-02-01-01-01-046 


51 


67 


83 


. 68 


01.02-01-01-01-021 


62 


74 


91 


76 


02-03-01-01-01-004 


54 


71 


87 


71 


04-05-01-01-02-001 


33 


^5 


75 


53 


05-07-Ol-Ol-Oi-U0'4 


U 


45 


66 


46 


04.09-01-01-02-034 


29 


A6 


62 


46 



5-3 



ERIC 



375 



Tabla 8« 6 CContinued) 



Mean easimss 
grades 4«6 



69,83 



67,00 



64.33 



70,50 



71.67 



60, 67 



1 


Saslnessi Peroent of 


Resgon 


SSS UOtteCt 


Passage 




Gradt 






4 


5 


o 




03-05-01-01-01-009 


86 


89 


92 


89 


n3-07-Ql-01-01-OW 


82 


92 


93 


89 


04-09-01-01-05-037 


61 


71 


78 


70 


06-11-01-01-03-015 


60 


74 


80 


72 


07-13-01-01-01-009 


36 


48 


57 


4? 


09-13-01-01-01-001 


45 


50 


62 


52 


03-06-01-0 1-03-02^, 


78 


85 


83 


83 


04-08-01-01-01-022 


61 


75 


79 


72 


0S-09-01-0l-04-0'',l 


72 


83 


88 


82 


06-11-01-01-01-012 


59 


70 


68 


66 


07-13-01-01-03-013 


46 


55 


58 


S3 


09-15-01-01-02-004 


3 5 


4Q 


51 


46 


03-05-01-01-02-0' 1 


77 


84 


88 


83 


0^08-01-01-01-025 


65 


75 


79 


74 


04-09-01-01-05-038 


51 


6C 


7:1 


6i 


06.11-01-01-04-019 


51 


68 


71 


64 


07-13-01-01-05-015 


+4 


61 


65 


57 


07-15-0l-01~01-02D 






53 


47 


03-06-01-01-01-015 


87 


91 


95 


91 


04-08-01-01-04-028 


71 


SO 


87 


SO 


04-09-01-01-01-031 


70 


83 


85 


80 


06-11-01-01-03-017 


44 


60 


67 


58 


06-13-01-01-Q2-C29 


52 


70 


78 


67 


08-16-01-01-01-013 


J / 




49 


47 


03-06-01-01-01-016 


, 9 


81 


88 


83 


05-07-01-01-04-006 


84 


85 


92 


87 


05-09-01-01-01-017 


71 


70 


82 


75 


06-11-01-01-02-01.3 


74 


70 


85 


76 


O6_i3_01_01-02-030 


48 


55 


69 


57 


07-15-01-01-05-n24 


42 


49 


63 


51 


03-05-01-01-01-002 


■ 83 


82 


87 


84 


04-07-01-01-03-013 


68 


63 


68 


68 


05-10-01-01-01-025 


64 


70 


75 


69 


05-11-01-01-01-030 


44 


57 


62 


56 


07-13-01.01-01-008 


48 


62 


63 


60 


Oa-15--Jl-01-Oj-026 


19 


30- 


30 


27 
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Table 8,6 (Continued) 



Form Meun easiness 
grades 4-G 



Passage 



Easinessi Percent of Responses Correct 
Grade 



4-6 



19 



69. 67 



20 



70,00 



21 



69.17 



22 



66,00 



23 



69,50 



24 



64,83 



cj..us..u;..0]...o?. u/i. 


82 


«3 


87 


84 


04 oa-o; -0 '..-•OL- 


71 

f '- 


7 5 


79 


73 




74 


77 


31 


70 




64 


70 


77 


70 


.'t,-iJ..U1.0i- 


55 


67 


60 


61 




'^L 


55 


53 


30 


03-05-01..0l-0'i.Oi3 


80 


88 


97 


89 


03-07-01-01"i"-027 


82 


85 


89 


85 


05-10-03-01-01-023 


70 


81 


93 


82 


06-11-01-01-03-016 


62 


75 


84 


74 


08-14-01-0.-03-003 


35 


46 


56 


46 


08-15-01-OL-Q1-005 


34 


46 


52 


44 


04-06-01-01-01-002 


81 


88 


86 


86 


U H— W/'~U.L""wi. — V A— W J, i 


7 A 


79 


an 


79 


05-09-01-01-02-013 


76 


86 


86 


83 


06-11-01-01-02-014 


58 


70 


74 


68 


06-13-01-01-01-027 


41 


59 


55 


33 


08-16-01-01-04-017 


40 


43 


52 


46 


03-06-01-01-01-OlS 


82 


84 


92 


86 


VJ W / — WA — W± Wi— WV/w 


7 ^ 


78 


85 


80 


04-09-0 UO J -01-030 


67 


71 


78 


72 


07-11-01-01-05-002 


53 


55 


66 


58 


07-13-01-01-01-006 


34 


61 


70 


M 


08-15-01-01-01-007 


31 


40 


42 


30 


03-05-01-01-03-014 


SO 


S7 


91 


86 


05-07-01-Ot-Ot-OOl 


80 


86 


89 


85 


05-09-01-01-01-012 


76 


84 


88 


83 


06-11-01-01-05-020 


,:0 


61 


69 


63 


07.i,4_oi-Oi-n:-Qi7 


42 


53 


39 


52 


0/- 15-01-01-01-021 


40 


48 


55 


48 


O:.-O6-01-0U02-O04 


70 


36 


83 


80 


04-07-01-01-05-017 


69 


86 


81 


79 


04-09-01-01-01-033 


66 


78 


78 


74 


07-12-01-01-02-004 


48 


64 


63 


59 


08-13-01-01-03-002 


38 


57 


55 


50 


08-15-01-01-01-009 


41 


4£ 


30 


47 
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'•' le 8,6 (Continued) 



Fonn Mean eaainess 
grades 7-9 



25 



61,83 



26 



61,17 



27 



65.17 



28 



67,50 



29 



66,00 



30 



71,83 







Grade 






7 


8 


9 


" 7-9 




7 0 


/ ^ 


o J. 


7 /• 




e c 


AO 


/ w 








7 ^ 


7 n 


7 0 






7 f 


7 ^ 


6^ 


09-20-01-01-01-024 


41 


54 


so 


48 


10-22-01-01-01-030 


43 


61 


45 


^9 




7^ 


70 




7 A 




A S 
O D 


O J 




S4 A 




DO 


/ X 


77 


7 9 






49 


4A 


HI 


09-19-01-01-05-023 


40 


50 


59 


51 


10-22-01-01-01-032 


39 


44 


46 


43 




79 


7 ^ 


M9 


7 5 




A9 


A7 
0 / 


00 


A ^ 






70 


70 


< 7 




^^o 


0 / 


7 1 


A A 
QQ 


09-19-01-01-01-020 


62 


68 


66 


65 


10-22-01-01-01-031 


52 


57 


50 


S3 


n7-.l 9^01^01^01^003 


u J 

/ J 


oo 




o5 




7S 




11 


00-^1 ^^01^01-01-00'? 


7 ^ 


7 7 


7n 


/ J 


in 17 01-01-09=097 


J J 


AC 


7 

/ i 


A ''i 
□ •4 


09-20-01-01-05-027 


51 


64 


65 


59 


10-21-01-01-01-026 


43 


46 


47 


45 


n A - 1 1 ni — 01 —0^01 H 


U 1 


od 


q7 




07 1 Zi^OI -HI -0-^ ^0 J 7 




7 7 


7 e 


71 


OQ-1 ^-01-01-^09,^004 


A1 






7 1 


OQ- 1 7^01 -01 ^0 S-OO^J 


7 ^ 


7 ^ 


/ o 


7 

/ 4 


1 90<^01 -01-01-020 


47 




^o 




10-22-01-01-0?~005 


46 


47 


39 


44 


06-12-01-01-05-026 


fl2 


87 


93 


87 


07-l''4-0l-01-ul-0i6 


72 


77 


.33 


77 


08 - • o-Ol-Ol-OS-Oia 


78 


86 


93 


83 


03-lO-01..01-0.i"'/J^''' 


60 


77 


75 


71 


iO_ 20-01-01-05.023 


54 


68 


67 


o3 


i0..21-.01-0i-01-025 


38 


58 


46 


4b 
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T|ble 8«6 (Continued) 



Easinessi Parcani: of Re spotises Correct 



Fosn Mean easiness Passage Gxadm 

grades 7-9 7 8 9 7-y 



31 



32 



33 



34 



36 



66.83 06*11-01-01-03-013 

07- 13-01-01-03-013 

08- 15-01-01-02-010 
08-17-01-01-01«030 
10-19-01-01-03-015 
10-.22-01-01-02-034 

61,50 07-12-01-01-03-005 

07- 13-01-01-C1-C07 

08- 15-01-01-01-004 
08-17-01-01-01-019 
10-20-01-01-05-0 2 4 
10-22-01-0! -02-033 

62.50 06-lX-Ol-Ol-Ol-Oll 
C'-i 1 'n-01-03-014 

OP -oi-os-oii 
iii r -.01-04-011 

IC 2<'-- ^' :-01-02-021 

10-: -wL-oi-oi-oag 

64.00 06-11-01-01-03-017 
06-13-01-01-02-028 
08-16-01-01-01-029 
10-17-01-01-01-001 
10-19-01-01-05-016 
10-21-01-01-01-028 

63.50 06-12-01-01-03-023 

06- 13-01-01-04-032 

08- 16-01-01-01-014 
10-18-01-01-01-007 
10-20-01-01 03-022 
10-20-01-01-01-017 

69.50 06-11-01-01-01-012 

07- 13-01-01-01-006 

09- 15-01-01-01-003 

10- 18-01-01-01-009 
10-20-01-01-01-019 
10-19-01-01-05-014 



91 


89 


91 


91 


75 


76 


79 


77 


54 


64 


63 


60 


54 


62 


62 




60 


60 


52 


DO 


54 


61 


54 


JO 


80 


83 


83 


82 


54 


62 


66 


6% 




71 


65 




> - 


70 


75 


67 




55 


62 


55 


1 


41 


42 


40 


76 


82 


86 


81 


73 


83 


87 


81 


^9 


79 


82 


77 


.47 


61 


63 


57 


46 


59 


55 


53 


26 


25 


27 


26 


68 


73 


81 


73 


:b 


81 


87 


82 


55 


61 


71 


62 


58 


62 


62 


60 


51 


62 


61 


57 


46 


53 


50 


50 


66 


71 


77 


71 


77 


84 


89 


83 


60 


67 


69 


63 


43 


55 


59 


52 


44 


60 


57 


53 


52 


64 


54 


57 


81 


85 


90 


86 


80 


82 


91 


84 


53 


60 


72 


61 


67 


73 


78 


72 


61 


60 


71 


64 


45 


53 


55 


50 
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Table 



minm o£ J'a^saps on Multlpli-ehsisi Cloze EKeKiisSi Leval I 



Paasige 



OUOl-OUOl-Oi-OQl 
01.01 «QW0l.0l-002 

Oi-oi-ouoi.oi-oa3 

01-Ql-(ll-0l-01'^005 
01.01-01-01-01-006 

oi^i-nuoi-ouooi 

01.01- 01.01-01.008 

ni-oi.i)Uni.oi.00') 
01.01 jiunuOi-OW 
01.01.01.01-01.0:15 

01.02- ni.fluoi.oi8 

Ul-02-fll'01.fll.02l 
Ol.O2-01-0l.Ol.O23 
Ol.O2.0U0UOl.O26 
01-02.0141-01-027 
0U02-OI-0UOU036 
01.02-0U01-01-QJ7 
01-02.01-0 UOU039 
QUO2.01-0U0UO4O 
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47 

46 
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38 

Ji 
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33 
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53 
57 
50 

32 



31 



52 
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64 
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70 



76 

72 
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61 
62 
60 
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71 
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r-; 064241-OM3-CW 




87 


fl642«0l^OU05*O4O 






B5 
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§? 
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vH 0643-01*Dl-Q4*Ci2 






33 






84 


'- vnQ j 1 f^i 

07434I^OU03*Oi)o 


.61 






0743 Jl*Uli'01*CiJ/ 






16 


' AT 4i rj=i rti 


: 


17 
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81 










fli 
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77 








71 




0744*01-0141411 ' 


1 
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n«*i4-Ql40l 43^03 

yg "i^^f "y A* V4 i^y J "*w if 












72 






61 
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08.1S-fll.Ol.Ol>.0O8 








, 0845-01.01 4M10 




77 




' 08.13-01.0145*811 




7S 




•:, o?->li*Dl»Ol .01-002 












71 




OWii'Ol-OL'Ol-Mi 










' , ' 65 










70 










85 


:y: M|uiii.ri.iii.i!iUiii 









90* 



91 
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Eisiiiissi hmm of leiponiei Scfsicii 



20*29 30,39 4049 504§. 

59 



57 



SI 

38 
57 



19 
57 

51 
53 



50 



16 



49 
44 



53 



Tibii a.a , ; . 

Itm Diviimy on Multiple*Gholc| Clais IxereiiiSj Level I 







Fit 


lirt 




Fassaga 




Item nean 






Pssgape lasMii 


Itm 


eaiinisi i^m 






no nE hi ^< rti flnl ^ti 




71 
ill 








18 


,31 


a 






20 


,40. 


■ 3 ■ 


Delitifl foirf Ibavs |Mi livsl of 










passagei ititfietot (ft) ifflianticaU^ 










pliuiitli 


OW7-01-Ol*02-012 #46 


2§ 


*34 


3 


OiitEictoi (1) imincieally pliusihli 


O4^09-01.*01*05-03? 132 


32 


' ,13 ■ 


3 


Mion 




33 


.13 


1 


Insuffietet eonfceKtutl elues 


03-05-01*01*01-00? .51 


15 


.38 5J7 


3 


DiatEiGtoi: (i) sminticiLly plauittli 




la 


.40 


2 


Distiietoii iWinticilly piiuiiblfl 










in niTO oontixt; 


04*08-01*01*01-020 ,35 


24 


.22 


2 


DisErictor (a) geraintieally pUusiWl 


OJ«09-01-Oi*01-01& ,40 


32 


,64 


2 






40 


.28 


3 


Diitraetor («) lananfeicaUy pUuaifek 




41 


.25 


1 


Uim 


01-03-01.01*01-033 ,66 


7 


,82 


1 


Iltli, cuaa coWict atiiw it 




a 


,53 


'J 


Diitrictof {%] Sinifttieally plauilKl 










In mmv imtmt y 


O3-06>0i-Ol-01-01S ,45 


10 


.5? 


2 


Qmrn- BumUtim of words 




11 


t32 


2: , 


Insuffisiint contixcuii cluei' ■■ 


O4*Oa«Ol-Ol-Q2-026 ,43 


2^ 


,30 


1 




O34O-O1-O1-Q1-02S ,41 


32 




1 


Gmon asioeiitiQn of wordi 



Table ht (Gontinuid) 



Foan 
9 



Passage 



1 Itm ill 



Fit Fart 
iiin of 



InEerpraEatiQn 



05-08-01-01-01-I 



04-09-Oi.01«Q3«Q3i »40 



*71 


8 ,83 


1 


Qmm assottiition of wtds ' 




,10 ■ "■"i55 
13 ' ,76 


2 




«T f*" w 


1 


Title euis eoirict aniwis 




13 .43 


8,44 1 


Iniuff iciint contextuil clussi 
dlil*we£ori (s) and (d) linantieally 
pliiuiils 




23 ,69 


1 


f itii eusi eojfget answer 




28 .24 
31 ,3? 


3 
2 


Innflicibii : 
InaifflciiTit contsxtuil cluis 


»40 


32 .54 

33 .61 


1 

3 


Qmm ii,^ociation of wordi 
Coianoii iiioeiation ef wrts 




36 *2? 


1 


' Diifirictor (i) siflinticiUy plaustblsi 



37 
38 
40 



.24 
,3i 
.15 



15.19 



prioj lictuai knowledge tequired 
Inisflieabli . 
Ooroofii iisociition of wrii 
Delatid wo^id colloquial ei^iressioii 
Inipppopfiati to paisagi 
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1 

labia S.'8 (Cohtinued) 
Itan Deviancy on Multlpla-Gholei Gloza ExirciseSj Uvel II 



Eiseage Itan mm of 

Passage eaginess Im easiness square sgeesh InteEpretafcion 



14 04-08-01-01-01-022 ,11 U S 1 Title cuai correct answer 

19 tSO 3(23 4 Difficult lentenci construction 
•■ 06-li-OWl -01-012 i66 31 .84 3 Title cuei oorrict answar 

35 ,45 6i98 4 Dlitractor (b) saaantically pliusible; 

insufficient contixEuil clues , 
OT-lS-Ol-Ol -03-013 ,53 42 ill 2 Comon association of wrds 

46 ,37 1 Specialized word usage 

47 ,36 3,94 4 Difficult iintence conitructlon 
09-15-01-01 ■02-004 ,46 37 i31 1 , Prior factual knowledge required 

59 ,29 2 Typographicai error in contaxt (word 

omitted) 

15 03-05-01-01-02-001 *83 4 #44 3 Inexplicable 

7 ,64 3 Dlitractor (c) seantically plausible 

04-08-01-01-01-025 ,74 18 .59 5,93 1 Distractor (b) amantically plausibli 

in narrow context 

20, .39 5,27 1 Distractor (i) lanantically plausible 

in narrow context 

04-09-01-01-05-03B ,61 22 .80 '1 Gomon assoeiation of words 

25 .29 4,59 1 Difficult sentence construction . 
28 «35 4 Disorganiiid paiiagei distractor (c) 

imanticilly plausible in narrow content 

06- 11-01-01-04-019 ,64 39 ,79 2 Gomon association of TOrds 

07- 13-01-01-05-015 .57 41 .41 3 Inej^licable 

42 ,36 •4 Idiom) distractor (b) semantically 

■ plausible (coUoEpiialisfli) 
07-15-01-01-01-020 ,47 52 ,28 4,86 2 Inis^licable 
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Table B,B (Continued) 



04-07-01-01-03*013 .68 



05-10-01-01-01-025 .69 



05-11-01-01-01-030 .56 



08-15-01-01-05-026 





Fit 


mt 




Passage 


Itan tniin 


of 




fiisinesi Itai 


easiness gquare 




IntiEpretition 


.84 1 


,66 


2 


DlitraeEot (b) laMntleally plausible 








in narrow context 


4 


iS5 8,08 


1 


Insufficient contixtual cluesj 








diitractor (b) iuantically pUusibie 


.68 11 


,94 


2 


Goiiiion association of ^rds 


12 


.41 10,04 


2 


insufficient contextual filuis 


13 


•87 


1 


GMion associition of words 


15 


•85 


3 


Gonion asiofiiation of woidi 


17 


i83 


3 


Cmon association of words 


18 


.30 


2 


Insufficient contextual clues 


19 


•37 846 


1 


Iniufficiint contixCual clues 


.69 22 


.84 


1 


Gonmon aisoclation of ^fflrds 


23 


i84 


1 


Goraraon association of words 


25 


.37 


2 


Difficult sentence construction 


.56 31 


.8? 


1 


Gamon asBOciation of words 


35 


•26 4,73 


2 


Distractor (c) sifflantlcally plausibli 








in nirEow context 


36 


•36 8.81 


1 


Bpeeialiied word usage 


37 


•39 


2 


Distractors stnintically plauiible in 








narrow context 


39 


.77 


1 


Coiiinon aiSQClation of words 


.60 48 


.26 


1 


Idiom 


.2? 52 


.12 


3 


Typographical error in context 


55 


•48 


3 


Goranion association of words 



Table 8*8 (Continued) 



Fom 



17) 

I 



. ' Fit Part 

i 

Passa|i Itm niean of 

lasiness Itin easiness square ipaieh 



0843-01-01-03-002 .50 
0845-01-01-01-009 ,4? 



.80 8 ,36 


5,09 


1 


,79 17 ,64 




1 


.59 33 ,43 




3 


37 ,14 


5,34 


3 


,50 44 ,20 


10.49 


3 


46 .73 




1 


.47 51 ,20 


125 47 


3 


53 ,68 




2 


56 .29 


4,69 


3 



,32 3,86 



60 .66 



Interprgtation 

Inej^llcabli 
Idta 

Insuffiaient CQnteictual aluas 
insufficient contextual eluis 
Ine^llfiible 

Gmion asiofiiation of mtU 
Distraetor (e) saaantieally plauii 
deleted word above grade level of 
pasiage 

Camon aiioeiatlon of wrds 
Iniufficient contextuil clues 
Difflfiult sentence eonitruction). 
deleted word above grade level of 
passage 
Idiom 



Table 8,8 (Continued) 



Itan Deviincy on Multiple^Ghoifla Glois Exetaiiei, Level III 



p. 


IS sags 




Ititi 


Fit Ei« 
%m of 




aslneii 




iasineBi 


iquare ipeee 


0541-01-01-04.033 


,14 


2 




3,?0 2 






3 


,49 


1 






4 


,94 


1 






6 


,31 


6,13 3 






9 


.45 


3,00 1 


0844.-01-01^03-003 


.61 


11 


.89 


4 






12 


.3? 


3,64 1 






13 


*ao 


i 






16 


.40 


1 






18 


.84 


2 






15 


.15 


22,15 1 






20 


,85 


1 


084yi-0i-02-016 




23 


,50 


2 


094?-01-01-01-Q30 


,69 


31 


,25 


1 


0940-01-01-01-024 




41 


,19 


2 






42 


,73 


3 



Fem ' Fas|a|a easineji ICai easinegg iquare ipeeeh _ Intsrprstatign 



25 0541-01-01-04.033 ,14 2 ,93 3,10 2 Gonmon aiSOGiition of worfs 

Insufficiant concaxtuil clues 
Comon aiiosiation of words 
Delstid word above gradi Iivel of 
piisage 

Diitrictor (d) sraanticiUy plauiible 
in narrow context 
Comon issociation of troifdi 
Difficult lantinse conitruction 
Gomon ssiociation of mtk] lyntacti- 
cally taplauiibli diitEictors 
Difficult iintinci Gonstruetioni dalstsd 
word above grids leval of paiiags 
Comon asiociation of words; syntasti* 
cilly Mplausibla distractors 
Diitractori lanatiticiiiy plaueibli in 
narrow contixt 

Gmnon aiiociition of wordi; lyntacti* 
cally taipliiisible districtors 
Diitrictor (b) lanintically plausible 
Distractor (a) laianticaUy plausible] 
insufficient contextuil clues 
Inijiplicable 

Cmion aisocifttion of words 
44 ,2/ 1 Difficult sintance construction; dsiited 

word ibova grade level of pisilgg 
10.22-01-01-01-030 #49 38 ,23 3 Difficult santince constructioni 

difficult wordi in context 



Table 8.8 (Ggntinuad) 
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to 



" " " ■ " " Fit Fait 

Fassags Itm mm , of 

Im Paisagg lasi nass Itin ei§lni§ | wm ipeich Interpfetatton 

26 06-12-01-01-03-024 ,74 10 #44 3,22 3 Dietractor (b) generally aiiociatsd 

with word I in conEext: 

1 Difficult iintenci conitrustion 

2 DlffiflulE ssntanci conltruetion 
2 Camon isiociation of words 

3s?? 1 Difficult Sintinci conscruGtion; 

delated word above grade level of 
passip 

8,03 1 Difficult ientincs conltruetion 

2 Omon aisoclatlon of wordi 
1 Gofflmgn aesofilation of words 

3 COTnon aisociition of words 

30 06-12-0l-01-O5»026 SI 4 ,52 2 Difficult santance congtruction 

4, 94 3 Difficult sentance construction 

1 Difficult santence conltruetion 
3 Idiom 

3 Speciiliied werd ueip 

2 Difficult sentence conitEUCtion; 
diitrietor (a) iaaanticaily plausib] 



06-12-01-01-03-024 ,74 


10 


*44 


01-13-01-01-01-010 i86 


12 


,60 


QI-15-0i-01-02-022 ,72 


21 


.52 


1048-01-01-01-008 i41 


33 


.63 




35 


,23 




38 


.21 




40 


.61 


09- 19-Ol-0i-Q3«023 ,51 

10- 22-01.01-01*032 ,43 


41 
31 


,73 
,61 


06-12-01-01-O5-026 ,87 


4 


,52 


084841-01-05-024 ,71 


33 


,52 




36 


.48 


10.20-01.01-^05-023 ,63 


43 


S 




50 


.43 


10-21-01-01-01^025 ,48 


% 


.11 



Table ^'8' (Continued) 
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Fit Part 
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Pagsagg easinaiE 


' Ti" fain 


q3 fl0 £6 


?ctuaffi SDSfich 




nfi ii-ni-ni-ni*(ll 2 iflfi 


5 


M 


4 


insufficient contsxtual clusi 




11 


9 JU 


4.61 4 


Inexolicabli 






.41 

8 TJi 


8.86 1 


Insufficianfc contixttial elueil 










difficiiit iintsncs conitruction 






.25 


§.00 1 


Difficult santanci cenitiuctiofl 




25 


.8? 


1 


Titlt cuts corrict mint 




27 




1 


Oomon aiioQiation of wocdi 




3§ 


i31 


2 


Diff^ult isntinci constmction 


10-2Q-01-Q1-01-019 464 


SO 


M 


1 


Difficult sentincs conitEUCtioti 






.24 


2 


Dilitid word difficult bacau la of 




52 


.75 


1 


specializad and datid usage 
Idiffln 




as 


.27 


1 


Dsletid word difficult becauii of 










, specialised and datid usage 
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Table 8.9 



Deviant Items by Part of Speech, Level I 





Part of 




IN uin u e r u i- 


PropoiTtion of 


Form 


speech 


Lota, i 


QeViant xteniB 


dsvlant Itenis 




Noun 


20 


1 


.05 




Verb 


15 


2 


.13 




Adjeative 


O 


■a 


.50 




Adverb 


n 


0 


0 


3 


Noun 


17 


1 


.06 




Verb 


17 


3 


.18 




Adjective 


o 


4 






Adverb 


1 






8 


Noun 


15 


j 






Verb 


18 


J 


1 7 




Adjective 


3 








Adverb 


3 






9 


Noun 




7 


,28 




Verb 


12 


3 


.25 




Adjective 


4 


3 , 


.75 




Adverb 


0 










Totals 


bv Level 






Noun 


77 


12 


.16 




Verb 


62 


11 


.18 




Adjective 


19 


8 


.42 




Adverb 


4 
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Table 8.9 (Continued) 
Deviant Items by Part of Speechi Level 11 



Fgrm 


Part of 
speech_ 


Total 


Nuraber of 
deviant Items 


Proportian of 
deviant items 


14 


Noun 
Verb 

Adjective 
Adverb 


23 
21 
12 
4 . 


4 
2 
1 
3 


.17 
. 10 
.08 
.75 


15 


Noun 
Verb 

Adjective 
Adverb 


24 
22 

9 
5 


4 
2 
3 
2 


.17 
.09 
.33 
.40 


18 


Noun 
Verb 

Adjective 
Adverb 


23 
22 

11 
4 


7 
4 


.39 
,32 
.36 




Noun 
Verb 

Adjective 
Adverb 


25 

21 

12 

2 


4 
2 
5 


,16 
.10 
.42 


Totals bv Level 




Noun ' 
Verb 

Adjective 
Adverb 


95 
86 
44 
15 


21 
13 
13 
5 


.22 
.15 
.30 
.33 
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Table 8.9 (Continued) 



Deviant Items by Part of Speech, Level III 



Form 



25 



Part of 

speech 

Noun 
Verb 

Adjective 
Adverb 



Total 

29 

13 
k 



Number of 
deviant items 

10 
4 
3 
1 



Proportion of 
deviant items 

.34 
.29 
.23 
.25 



26 



Noun 
Verb 

Adjective 
Adverb 



22 
26 
7 
5 



3 
4 

2 



,14 
,15 
,29 



30 



Noun 
Verb 

Adjective 
Adverb 



31 
9 

16 
4 



.03 
.22 
.19 



36 



Noun 
Verb 

Adjective 
Adverb 



31 
13 
7 
10 



Totals by Level 



,23 
,15 

,20 



Noun 
Verb 

Adjective 
Adverb 



113 
62 
43 
23 



21 

12 
8 
3 



19 
19 
19 
13 
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Table 8.10 

Analysis of Multiple- Choice Cloze Items By Part of Speech 



Pbrm 



Part of 
Speech 

Noun 
Verb 
Adjective 
Adverb 



Number 
of 

Ltems 

20 
15 

6 

0 



laslness 
average 

.54 
.54 
,37 
0 



Difficulty 
average 

-.12 

-.18 
,85 
0 



Fit Mean 
square 

average 

1.27 
2,33 
,88 
" ■ 0 



Point blserlal 
correlation 
average 

,56 
,47 
.54 
0 



Noun 
Verb 
Adjective 
Adverb 



17 
17 
6 
1 



.54 

.56 
,39 
,51 



.,06 

.,26 
,88 
.11 



2,38 
1.04 
2.11 
.71 



,56 
.53 
.55 
,61 



Noun 
Verb 
Adjective 
Adverb 



15 
18 

3 
3 



,52 
.51 
.35 
.42 



.,18 
.,07 

.85 
.45 



1.38 
1,36 
1.21 

2.33 



,56 
,52 
,55 
,50 



Noun 
Verb 
Adjective 
Adverb 



25 
12 
4 
0 



,56 
.53 
,48 
0 



..15 
.12 
.43 

0 



2.51 
2.71 
2.00 
0 



.58 
.53 
.53 
0 



14 



Noun 
Verb 
Adjective 
Adverb 



23 
21 
12 
4 



.64 
.70 
,70 

.51 



.11 

-.19 
-.24 
1.07 



1.84 
1.48 
1.36 
4.31 



.53 
'.53 
,55 
.44 



15 



Noun 
Verb 
Adjective 
Adverb 



24 
22 
9 
5 



.68 
.64 

.61 
.53 



-.28 
-0.00 
.32 
.78 



2.28 
1.80 
2,34 
1.78 



.43 
.49 
.44 

,43 
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Table 8 * 10 (Contimed) 





part of 


Number 


Easiness 






Form 


Speetih 


j Items 


parage 


18 


Noun 


23 


.61 




Verb 


22 


.61 




Adjective 


11 


• 63 




Adverb 


4 


• 50 


24 


NOUTl 


25 


.69 




Verb 


21 


.67 




Adjective 


12 


• 53 




Adverb 


2 


.66 




^OUTl 


29 


.58 




Verb 


14 


.66 




Adjective 


13 


.61 




Adverb 


4 


.74 




Moiiti 


22 


.38 




Verb 


26 


.60 




Adjective 


7 


.63 




Adverb 


5 


.80 


30 


Noun 


31 


.78 




^ Verb 


9 


.64 




Adjective 


16 


.65 




" Adverb 


4 


.69 


36 


Noun 


31 


.69 




Verb 


13 


.66 




Adjective 


7 


.73 




Adverb 


10 


,75 



nif f icul tv 

Mj ju -1- A- J« ^ii^ 1^ tjr 

^^avertige 


Fit mean 

m el ^ ^ 

average 


Point biserlai 
average 


-.03 






-.08 


2,26 


.38 


— . y u 


1.04 


• 47 


• / o 


1,79 


.44 


-.28 


1*66 ^ 


.53 


-.10 


1*69 


.52 


« /Q 


3,94 


• 41 




1 91 


.48 


,20 


3,13 


.40 


-.30 


2,00 




^» Ul 




• 52 






*46 


.27 


2,20 


,44 


.03 


1*59 


,44 


i 1 


0 77 


.43 




X * ?Q 


,38 




2 24 


.47 


.67 


3.63 


.44 


.84 


1,98 


.55 


,23 


2.35 


.45 , 


.05 


2.08 


.50 


-.26 


1.52 


.50 


-.25 


1.19 


.50 


-,30 


1.60 


.41 
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